This is a PoC demonstrating how two bots can autonomously "speak" to each other using an LLM and TTS. It uses NATS jetstream for message routing, ollama for generating text using an LLM of the user's choice and playht API for TTS speech synthesis.
Important
This project was built purely for educational purposes and thus is likely ridden with bugs, inefficiencies, etc. You should consider this project as highly experimental.
Click to watch/listen a sample conversation:
sequenceDiagram
participant GoTTS as TTS
participant GoLLM as LLM
participant Gobot
participant Rustbot
participant RustLLM as LLM
participant RustTTS as TTS
Gobot->>+Rustbot: Hi Rustbot!
Rustbot->>RustLLM: Hi Rustbot!
RustLLM->>RustTTS: Hi Gobot!
RustLLM->>Rustbot: Hi Gobot!
Rustbot->>-Gobot: Hi Gobot!
activate Gobot
Gobot->>GoLLM: Hi Gobot!
GoLLM->>GoTTS: Teach me about Rust!
GoLLM->>Gobot: Teach me about Rust!
Gobot->>-Rustbot: Teach me about Rust!
Zoomed in view on the high-level architecture:
flowchart TB
subgraph " "
playht(PlayHT API)
ollama(Ollama)
end
bot <-->ollama
bot <-->playht
bot <--> NATS[[NATS JetStream]]
Note
Mermaid does not have proper support for controlling layout or even basic graph legends There are some terrible workarounds, so I've opted not to use them in this README, hence the diagram might feel a bit unwieldy
flowchart TB
ollama{Ollama}
playht{PlayHT}
llm((llm))
tts((tts))
jetWriter((jetWriter))
jetReader((jetReader))
ttsChunks(ttsChunks)
jetChunks(jetChunks)
prompts(prompts)
ttsDone(ttsDone)
subgraph NATS JetStream
Go(go)
Rust(rust)
end
Go-- 1. -->jetReader
jetWriter-- 7. -->Rust
jetReader-- 2. -->prompts
prompts-- 3. -->llm
llm-->ollama
llm-- 4. -->ttsChunks
llm-- 4. -->jetChunks
jetChunks-->jetWriter
ttsChunks-->tts
tts-- 5. -->playht
tts-- 6. -->ttsDone
ttsDone-->jetWriter
jet.Reader
receives a message published on a JetStream subjectjet.Reader
sends this message to theprompts
channelllm
worker reads the messages sent to theprompts
channel and forwards them to ollama for LLM generation- ollama generates the response and the
llm
worker sends it to bothttsChunks
andjetChunks
channels tts
worker reads the message and sends the message to PlayHT API and streams the audio to the default audio device;- once the playback has finished
tts
worker notifiesjet.Writer
via thettsDone
channel that it's done playing audio jet.Writer
receives the notification on thettsDone
channel and publishes the message it received onjetChunks
channel to a JetStream subject
There are a few prerequisites:
Both bots use nats as their communication channel.
Install
brew tap nats-io/nats-tools
brew install nats nats-server
Run:
nats-server -js
nix-shell -p nats-server natscli
nats-server -js
Download it from the official site or see the Nix install below.
nix-shell -p ollama
Run a model you decide to use
ollama run llama2
If you are running on Linux you need to install the following libraries -- assuming you want to play with the bot-speaking service
Note
This is for Ubuntu Linux, other distros have likely different package names
sudo apt install -y --no-install-recommends libasound2-dev pkg-config
Once you've created an account on playht you need to generate API keys. See here for more details.
Now, you need to export them via the following environment variables which are read by the client libraries we use (go-playht, playht_rs):
export PLAYHT_SECRET_KEY=XXXX
export PLAYHT_USER_ID=XXX
Important
Once you've started gobot
you need to prompt it.
gobot
reads prompt from stdin
which kickstarts the conversation:
rusbot
waits for gobot
before it responds!
Start the gobot
:
go run ./gobot/...
Start the rustbot
:
cargo run --manifest-path rustbot/Cargo.toml