Skip to content

Use OpenAI's realtime API for a chatting with your documents

License

Notifications You must be signed in to change notification settings

run-llama/voice-chat-pdf

Repository files navigation

This is a LlamaIndex project using Next.js

Voice Chat with PDFs

This is a an example based on the openai/openai-realtime-console, extending it with a simple RAG system using LlamaIndexTS.

Prerequisites

The project requires an OpenAI API key (user key or project key) that has access to the Realtime API. Set the key in the .env file or as an environment variable OPENAI_API_KEY.

Getting Started

First, install the dependencies:

npm install

Second, generate the embeddings of the documents in the ./data directory:

npm run generate

The example PDF is about physical letter standards, you can use your own documents.

Third, run the development server:

npm run dev

Open http://localhost:3000 with your browser to see the result.

Using the console

You'll be prompted on startup to enter the API key again (this needs to be fixed).

To start a session you'll need to connect. This will require microphone access. You can then choose between manual (Push-to-talk) and vad (Voice Activity Detection) conversation modes, and switch between them at any time.

You can freely interrupt the model at any time in push-to-talk or VAD mode.

Learn More

To learn more about LlamaIndex, take a look at the following resources:

You can check out the LlamaIndexTS GitHub repository - your feedback and contributions are welcome!