Recording Audio#

If you want to record the incoming audio for a conversation, you can just fork the audio stream and send it to a file. Recording the audio allows you to do quality control and also generate new test cases for unusual situations

The block below updates the websocket endpoint from the quickstart to save the audio from each chat to a separate file.

 1    @app.websocket("/ws/audio")
 2    async def audio_websocket_endpoint(websocket: WebSocket, id: str):
 3        stream = fastapi_websocket_bytes_source(websocket)
 4        stream, audio_stream  = fork_step(stream)
 5        stream = google_speech_v1_step(
 6        stream,
 7        speech_async_client,
 8        audio_format=AudioFormat.WEBM_OPUS,
 9        )
10        stream = log_step(stream, "Recognized speech")
11        stream = map_step(stream, lambda x: {"query": x})
12        stream = langchain_step(stream, chain, on_completion="")
13        stream = recover_exception_step(
14        stream,
15        Exception,
16        lambda x: "Google blocked the response.  Ending conversation.",
17        )
18        stream = google_text_to_speech_step(
19        stream, text_to_speech_async_client, audio_format=AudioFormat.MP3
20        )
21        stream = map_step(stream, lambda x: x.audio)
22        done = fastapi_websocket_bytes_sink(stream, websocket)
23        
24        audio_done = binary_file_sink(audio_stream, f"call_logs/{id}.webm")
25        
26        await asyncio.gather(done, audio_done)
  • In line 4, we fork the audio stream before it goes into the recognizer. This will send copies of the data to both the recognizer and our file.

  • In line 24, we send the forked audio to a file, with a name based on the id passed from the client.

  • In line 26, we wait on both sinks. It’s always best to await all sinks in a data flow at once.