Skip to content

juanlucasumali/uno

Repository files navigation

uno

Watch the demo here.

Uno: Voice-Controlled AI Assistant

Uno is a voice-controlled AI assistant that uses the Ollama language model and Whisper speech recognition to provide an interactive conversational experience. It allows users to communicate with the AI assistant using natural language through speech input and receives responses in both text and speech format.

Features

  • Voice-controlled interaction with the AI assistant
  • Real-time speech recognition using the Whisper model
  • Text-to-speech output using the pyttsx3 engine
  • Integration with the Ollama language model for generating responses
  • Customizable configuration through a YAML file
  • Push-to-talk functionality for capturing speech input
  • Visual indicators for recording status and messages
  • Logging for debugging and monitoring purposes

Requirements

  • Python 3.7 or higher
  • PyTorch
  • Whisper
  • pyttsx3
  • PyAudio
  • Pygame
  • requests
  • PyYAML

Installation

  1. Install Ollama.

  2. Download the llama model using the ollama pull llama command.

  3. Download the base.en OpenAI Whisper Model here.

  4. Clone and cd into the repository:

    git clone https://github.com/your-username/uno.git
  5. Place the downloaded base.en Whisper model into a /whisper directory in the repository's root folder.

  6. Install the required dependencies:

    pip install -r requirements.txt

    Or, if you use pip3:

    pip3 install -r requirements.txt
  7. Configure uno.yaml in the project directory. Adjust the necessary settings such as API endpoints, model paths, and initial prompts in the YAML file.

Usage

  1. Run the Uno application:

    python main.py

    Or:

    python3 main.py
  2. The application window will open, and Uno will initialize the necessary components.

  3. Press and hold the Space key to start recording your speech input.

  4. Speak your query or command while holding the Space key.

  5. Release the Space key when you finish speaking.

  6. Uno will process your speech input, send it to the Ollama API for generating a response, and then convert the response to speech output.

  7. The response will be displayed on the application window and played back as speech.

  8. To stop the speech playback, press the S key.

  9. To exit the application, press the Esc key.

Contributing

Contributions to Uno are welcome! If you find any bugs, have feature requests, or want to contribute improvements, please open an issue or submit a pull request on the GitHub repository.

Acknowledgements

  • Ollama - Language model for generating responses
  • Whisper - Speech recognition model
  • pyttsx3 - Text-to-speech conversion library
  • PyAudio - Audio input/output library
  • Pygame - Library for creating graphical user interfaces

Credits

Uno was heavily inspired by and incorporates code from the following GitHub repositories:

I would like to express my gratitude to the authors of these repositories for their valuable contributions and inspiration.

Contact

For any questions or inquiries, please contact [email protected].


About

A personalized AI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages