Watch the demo here.
Uno is a voice-controlled AI assistant that uses the Ollama language model and Whisper speech recognition to provide an interactive conversational experience. It allows users to communicate with the AI assistant using natural language through speech input and receives responses in both text and speech format.
- Voice-controlled interaction with the AI assistant
- Real-time speech recognition using the Whisper model
- Text-to-speech output using the pyttsx3 engine
- Integration with the Ollama language model for generating responses
- Customizable configuration through a YAML file
- Push-to-talk functionality for capturing speech input
- Visual indicators for recording status and messages
- Logging for debugging and monitoring purposes
- Python 3.7 or higher
- PyTorch
- Whisper
- pyttsx3
- PyAudio
- Pygame
- requests
- PyYAML
-
Install Ollama.
-
Download the llama model using the
ollama pull llama
command. -
Download the
base.en
OpenAI Whisper Model here. -
Clone and
cd
into the repository:git clone https://github.com/your-username/uno.git
-
Place the downloaded
base.en
Whisper model into a /whisper directory in the repository's root folder. -
Install the required dependencies:
pip install -r requirements.txt
Or, if you use pip3:
pip3 install -r requirements.txt
-
Configure
uno.yaml
in the project directory. Adjust the necessary settings such as API endpoints, model paths, and initial prompts in the YAML file.
-
Run the Uno application:
python main.py
Or:
python3 main.py
-
The application window will open, and Uno will initialize the necessary components.
-
Press and hold the
Space
key to start recording your speech input. -
Speak your query or command while holding the
Space
key. -
Release the
Space
key when you finish speaking. -
Uno will process your speech input, send it to the Ollama API for generating a response, and then convert the response to speech output.
-
The response will be displayed on the application window and played back as speech.
-
To stop the speech playback, press the
S
key. -
To exit the application, press the
Esc
key.
Contributions to Uno are welcome! If you find any bugs, have feature requests, or want to contribute improvements, please open an issue or submit a pull request on the GitHub repository.
- Ollama - Language model for generating responses
- Whisper - Speech recognition model
- pyttsx3 - Text-to-speech conversion library
- PyAudio - Audio input/output library
- Pygame - Library for creating graphical user interfaces
Uno was heavily inspired by and incorporates code from the following GitHub repositories:
- ollama-voice-mac by Andy Peatling - A voice-controlled AI assistant for macOS using Ollama and Whisper.
I would like to express my gratitude to the authors of these repositories for their valuable contributions and inspiration.
For any questions or inquiries, please contact [email protected].