Conversing via Local Microphone and Speaker using Realtime API

This content originally appeared on DEV Community and was authored by M Sea Bass

Several code samples using the Realtime API provided by OpenAI and Azure are available online. However, Python code is only available on Azure's GitHub, and it assumes the use of an audio file as input.

Therefore, I modified the code to accept real-time audio input from the local microphone using Python. The modified version is available on GitHub. Since the code is simple and concise, it should be easy to integrate into other projects.

The original code is based on low_level_sample.py, and a detailed explanation is available in this article, which you can refer to.

About the Modifications

This article explains how to modify a Python application that processes audio to accept input from the local microphone and output audio data returned by the Realtime API through the local speaker. The implementation mainly uses the pyaudio library.

The modifications consist of the following two points:

Capturing audio input from the local microphone.
Outputting the audio data returned by the Realtime API through the local speaker.

1. Implementing Audio Input from the Local Microphone

Below is the code that captures audio input from the local microphone using pyaudio and sends the data to the Realtime API in real time.



async def send_audio(client: RTLowLevelClient):
    p = pyaudio.PyAudio()
    default_input_index = p.get_default_input_device_info()['index']
    stream = p.open(
        format=STREAM_FORMAT,
        channels=INPUT_CHANNELS,
        rate=INPUT_SAMPLE_RATE,
        input=True,
        output=False,
        frames_per_buffer=INPUT_CHUNK_SIZE,
        input_device_index=default_input_index,
        start=False,
    )
    stream.start_stream()

    print("Start sending audio")
    while not client.closed:
        audio_data = stream.read(INPUT_CHUNK_SIZE, exception_on_overflow=False)
        base64_audio = base64.b64encode(audio_data).decode("utf-8")
        await client.send(InputAudioBufferAppendMessage(audio=base64_audio))

This code captures audio data from the local default microphone using pyaudio, encodes it in Base64, and sends it to the Realtime API.

Key Points:

pyaudio.PyAudio() is used to operate the audio device.
get_default_input_device_info() retrieves the default input device.
stream.read() captures real-time audio data to send to the API.

2. Implementing Audio Output from the Realtime API to Speakers

Next is the code for outputting the audio data returned by the Realtime API through the local speakers.



async def receive_messages(client: RTLowLevelClient):
    p = pyaudio.PyAudio()
    default_output_index = p.get_default_output_device_info()['index']
    stream = p.open(
        format=STREAM_FORMAT,
        channels=OUTPUT_CHANNELS,
        rate=OUTPUT_SAMPLE_RATE,
        input=False,
        output=True,
        output_device_index=default_output_index,
        start=False,
    )
    stream.start_stream()

    print("Start receiving messages")
    while True:
        ...
            case "response.audio.delta":
                print("Response Audio Delta Message")
                print(f"  Response Id: {message.response_id}")
                print(f"  Item Id: {message.item_id}")
                print(f"  Audio Data Length: {len(message.delta)}")
                audio_data = base64.b64decode(message.delta)
                print(f"  Audio Binary Data Length: {len(audio_data)}")
                audio_duration = len(audio_data) / OUTPUT_SAMPLE_RATE / OUTPUT_SAMPLE_WIDTH / OUTPUT_CHANNELS
                print(f"  Audio Duration: {audio_duration}")
                start_time = time.time()
                for i in range(0, len(audio_data), OUTPUT_CHUNK_SIZE):
                    stream.write(audio_data[i:i+OUTPUT_CHUNK_SIZE])
                time.sleep(max(0, audio_duration - (time.time() - start_time) - 0.05))

This code decodes the Base64-encoded audio data received from the Realtime API and outputs it to the speakers using pyaudio.

Key Points:

get_default_output_device_info() retrieves the default output device (speakers).
stream.write() outputs the decoded audio data to the speakers in real time.
The length of the received audio data is used to adjust the timing, minimizing audio delay.

Thank you for reading to the end. If you have any questions or feedback about the code, feel free to reach out!

Reference Links

This content originally appeared on DEV Community and was authored by M Sea Bass

Print Share Comment Cite Upload Translate Updates

APA

M Sea Bass | Sciencx (2024-10-04T02:06:33+00:00) Conversing via Local Microphone and Speaker using Realtime API. Retrieved from https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/

MLA

" » Conversing via Local Microphone and Speaker using Realtime API." M Sea Bass | Sciencx - Friday October 4, 2024, https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/

HARVARD

M Sea Bass | Sciencx Friday October 4, 2024 » Conversing via Local Microphone and Speaker using Realtime API., viewed ,<https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/>

VANCOUVER

M Sea Bass | Sciencx - » Conversing via Local Microphone and Speaker using Realtime API. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/

CHICAGO

" » Conversing via Local Microphone and Speaker using Realtime API." M Sea Bass | Sciencx - Accessed . https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/

IEEE

" » Conversing via Local Microphone and Speaker using Realtime API." M Sea Bass | Sciencx [Online]. Available: https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/. [Accessed: ]

rf:citation

» Conversing via Local Microphone and Speaker using Realtime API | M Sea Bass | Sciencx | https://www.scien.cx/2024/10/04/conversing-via-local-microphone-and-speaker-using-realtime-api/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

About the Modifications

1. Implementing Audio Input from the Local Microphone

Key Points:

2. Implementing Audio Output from the Realtime API to Speakers

Key Points:

Reference Links

Related Posts