audio#

None

class voice_stream.audio.AudioFormat(value)#

Bases: str, Enum

A class representing an audio format as an Enum.

MP3 = 'mp3'#

MP3 audio format.

OGG_OPUS = 'ogg_opus'#

Ogg container with Opus audio.

WAV_MULAW_8KHZ = 'wav_mulaw_8khz'#

WAV Telephone audio - mu-law encoded 8kHz.

WEBM_OPUS = 'webm_opus'#

WebM container with Opus audio.

class voice_stream.audio.AudioFormatError#

Bases: Exception

Indicates a problem related to audio formats.

voice_stream.audio.audio_rate_limit_step(async_iter: AsyncIterator[bytes], audio_format: AudioFormat | Awaitable[AudioFormat], buffer_seconds: float)#

Data flow step that rate-limits the audio data coming in.

This step takes in audio data and produces the same audio data with delays introduced so that the downstream iterator only gets buffer_seconds worth of audio at once. Rate-limiting provides the ability to stop the audio stream due to an interruption or other event.

Parameters:
  • async_iter (AsyncIterator[bytes]) – An asynchronous iterator returning bytes of audio data.

  • audio_format (AwaitableOrObj[voice_stream.audio.AudioFormat]) – The format of the audio data. Can be an Awaitable if the format isn’t known when the step is created.

  • buffer_seconds (float) – The amount of audio to pass to the downstream iterator.

Returns:

The same audio bytes that came in, but rate-limited so that the downstream consumer only gets buffer_seconds worth of audio.

Return type:

audio

Raises:

AudioFormatError – If the audio format is not supported.

Notes

  • This function will break up long chunks of data in a format-specific way to perform the rate-limiting.

voice_stream.audio.get_audio_length(audio_format: AudioFormat, audio: bytes)#

Get the length of audio in seconds given the audio format and audio data.

Parameters:
  • audio_format (AudioFormat) – The format of audio data.

  • audio (bytes) – The audio data.

Returns:

The length of the audio in seconds.

Return type:

float

Raises:

AudioFormatError – If the audio format is not supported.

async voice_stream.audio.ogg_concatenator_step(async_iter: AsyncIterator[bytes]) AsyncIterator[bytes]#

Data flow step that concatenates multiple OGG streams into one.

With files in OGG format, you can’t concatenate two different streams simply by concatenating the bytes. This step performs the necessary operations to concatenate streams. It assumes data is coming in chunked into full OGG pages (the way ogg_page_separator_step() outputs it).

Parameters:

async_iter – OGG pages as bytes objects

Returns:

OGG pages, updated so that they form a single consistent stream.

Return type:

AsyncIterator[bytes]

Notes

  • This could be done more efficiently by using numpy to modify the buffer in place.

async voice_stream.audio.ogg_page_separator_step(async_iter: AsyncIterator[bytes]) AsyncIterator[bytes]#

Data flow step that splits incoming OGG data into distinct pages.

Takes in a stream of bytes from an OGG media file and outputs bytes ensuring that each output is a complete page. Checks if the last page in each chunk sent is a full page, if so, it sends it.

Parameters:

async_iter – Bytes from an OGG media file.

Returns:

Bytes objects, each representing a full OGG page.

Return type:

AsyncIterator[bytes]

Notes

  • If the data passed in contains a partial page at the end, that page data will be buffered until the next input.

voice_stream.audio.remove_wav_header(wav_bytes: bytes) bytes#

Removes the wav header from a wav file, regardless of the format.

Parameters:

wav_bytes – The beginning of a WAV file, including the header.

Returns:

The audio bytes from the file, without the header.

Return type:

bytes

async voice_stream.audio.wav_mulaw_file_sink(async_iter: AsyncIterator[bytes], filename: str | Awaitable[str]) None#

Data flow sink that writes telephone audio (8Khz mu-law encoded audio) to a WAV file.

Parameters:
  • async_iter (str) – A stream containing the audio data to write.

  • filename (str) – Name of the audio file. Can be an Awaitable[str] if the filename isn’t known at creation time (for example if it is generated based on data in the stream).

Notes

  • Assumes only audio data is passed. A WAV header will be placed before the audio to properly format the file.

async voice_stream.audio.wav_mulaw_file_source(filename: str, chunk_size: int = 4096) AsyncIterator[bytes]#

Data flow source that reads audio bytes from a wav file.

Parameters:
  • filename (str) – Name of the audio file

  • chunk_size (int) – Number of bytes to read at one time from the file. Passing 0 indicates the whole file should be read at once.

Returns:

A stream of audio bytes.

Return type:

AsyncIterator[bytes]

Notes

  • The WAV header will be removed, so the data being passed will only include audio samples.