Rime LiveKit Agents – Technical Overview

Project Summary

This project is a Python-based, real-time conversational AI agent system built on top of LiveKit and Rime.ai. It enables hyper-realistic, character-driven voice agents that can join LiveKit audio rooms, respond to users in natural language, and speak with expressive, customizable voices. The system leverages advanced TTS (text-to-speech) models, OpenAI LLMs (large language models), and a modular plugin architecture for extensibility.

Folder Structure

rime-livekit-agents/
│
├── .env                  # Environment variables (API keys, URLs)
├── agent_configs.py      # Voice/personality configs and prompt engineering
├── rime_agent.py         # Main agent logic and entrypoint
├── requirements.txt      # Python dependencies
├── text_utils.py         # Custom sentence tokenizer for TTS
├── TECHNICAL_OVERVIEW.md # This technical documentation
├── README.md             # Basic project info
├── KMS/                  # (Optional) Key Management Service or logs
│   └── logs/
└── __pycache__/          # Python bytecode cache

Key Components

1. `rime_agent.py`

Main entry point for the agent.
Handles LiveKit room connection, session management, plugin integration, and event loop.
Integrates TTS, LLM, STT, noise cancellation, and turn detection.

2. `agent_configs.py`

Defines agent personalities, TTS settings, and prompt engineering.
Example: "celeste" persona with a clingy, playful, flirty university girlfriend style.
Each persona can have unique TTS speed, model, and prompt.

3. `text_utils.py`

Implements custom sentence tokenization for advanced TTS models (e.g., Arcana).

Core Technologies

LiveKit: Real-time audio/video infrastructure for scalable, multi-user rooms.
Rime.ai: Hyper-realistic TTS models ("arcana", "mistv2").
OpenAI: LLMs (e.g., GPT-4o-mini) for generating conversational responses.
Python 3.11+: All orchestration and logic.
LiveKit Plugins: For noise cancellation, turn detection, and TTS enhancements.

Setup & Installation

1. Clone the Repository

git clone https://github.com/uw-datasci/AI-GF.git

2. Create and Activate a Virtual Environment

Windows:

python -m venv .venv
.venv\Scripts\activate

macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

3. Install Python Dependencies

pip install -r requirements.txt

4. Download Model Files (Hugging Face / Léa)

python rime_agent.py download-files

This downloads Hugging Face models used when an agent has provider: "huggingface" for TTS or LLM (e.g. Léa). Models are cached locally so the agent can run TTS and LLM via the transformers library in-process. STT is Silero (local) or OpenAI; no Hugging Face STT. Requires transformers and torch (see requirements.txt).

Environment Variables & API Keys

Create a .env file in the project root with the following keys:

LIVEKIT_URL=wss://<your-livekit-server>.livekit.cloud
LIVEKIT_API_KEY=<your-livekit-api-key>
LIVEKIT_API_SECRET=<your-livekit-api-secret>

OPENAI_API_KEY=<your-openai-api-key>
RIME_API_KEY=<your-rime-api-key>
ELEVEN_API_KEY=<your-elevenlabs-api-key>
SMALLEST_API_KEY=<your-smallest-ai-api-key>

# Optional: Tavus avatar integration
TAVUS_API_KEY=<your-tavus-api-key>
TAVUS_REPLICA_ID=<your-tavus-replica-id>

Required API Keys:

LiveKit: For connecting to your LiveKit Cloud or self-hosted server.
OpenAI: For LLM responses (ensure your key has quota).
Rime.ai: For TTS (Arcana, Mistv2, etc.).
ElevenLabs (optional): For ElevenLabs TTS voices.
Smallest AI (optional): For Waves TTS and Pulse STT. Get a key at console.smallest.ai.
Tavus (optional): For avatar video integration.

Note:

Do not surround values with quotes unless the value contains spaces.
If you see quota errors, check your OpenAI or Rime.ai usage and billing.

Running the Agent

1. Console Mode (Debugging)

Run the agent in console mode for local testing:

python rime_agent.py console

2. LiveKit Mode (Production/Demo)

Connects the agent to a LiveKit room:

python rime_agent.py dev

Ensure all required environment variables are set in .env.
The agent will join the specified LiveKit room and respond to participants.

3. Stopping the Agent

Press Ctrl + C in the terminal to stop the agent at any time.

Customization & Prompt Engineering

Prompt format (agent_template JSON)

The system prompt can be a plain string or an object with type and content:

String: "personality_prompt": "You are Katerina..."
URL: "personality_prompt": { "type": "URL", "content": "https://example.com/prompt.txt" }
File path: "personality_prompt": { "type": "File Path", "content": "prompts/katerina.txt" } (relative to project root)

Use content or Content; type is one of: String, URL, File Path.

TTS and STT in agent JSON

tts has top-level provider, model, url, and a nested voice_options object for provider-specific options. stt uses the same top-level shape.

"tts": {
  "provider": "elevenlabs",
  "model": "eleven_multilingual_v2",
  "url": null,
  "voice_options": { "voice_id": "...", "optimize_streaming_latency": 3 }
},
"stt": { "provider": "openai", "model": "gpt-4o-mini-transcribe", "url": null }

provider / model / url: same as before.
tts.voice_options: ElevenLabs voice_id, model_id, optimize_streaming_latency; Kokoro voice, speed, base_url; Rime speaker, speed_alpha, reduce_latency, max_tokens; Smallest AI voice_id, speed, sample_rate.

Local / embedded models (no API): For Silero, TTS and STT run locally inside the agent process (torch.hub). For Hugging Face, TTS and LLM can run in-process (transformers). STT is Silero (local) or OpenAI (cloud); Hugging Face STT was removed.

Chrystèle (Silero TTS/STT, local LLM): "tts": { "provider": "silero", "voice_options": { "language": "en", "speaker": "lj_16khz" } }, "stt": { "provider": "silero", "language": "en" }, "vad": { "provider": "silero", "model": "silero_vad" }. TTS and STT use snakers4/silero-models (torch.hub) in-process. LLM can be LM Studio (OpenAI-compatible URL) or another local server.
Léa (Hugging Face TTS/LLM, local): "tts" and "llm" use "provider": "huggingface" with Hugging Face Hub model IDs (see plugins/hf_tts.py, plugins/hf_llm.py). STT uses OpenAI (or set "stt": { "provider": "silero" } for local). Run python rime_agent.py download-files once to cache HF models. No API for TTS/LLM—models run locally in the agent.
Smallest AI (cloud): "tts": { "provider": "smallestai", "model": "lightning", "voice_options": { "voice_id": "emily", "speed": 1.0 } }, "stt": { "provider": "smallestai" }. TTS uses Waves API; STT uses Pulse API. Requires SMALLEST_API_KEY in .env. Get a key at console.smallest.ai.

Alternative (OpenAI-compatible servers): You can use local servers (e.g. Ollama, Whisper API, Kokoro) and "provider": "openai" with "url": "http://localhost:..." in the agent JSON. Those are still local but run in a separate process; Silero and Hugging Face run embedded in the agent with no separate server.

Provider Comparison Table

The table below shows which services each provider offers:

Provider	TTS	STT	VOD (Voice on Demand / Live Voice)	Type	API Key Env Var
Rime	✅ Arcana, Mistv2	❌	✅ LiveKit real-time	Cloud	`RIME_API_KEY`
ElevenLabs	✅ v2, v3, Turbo	❌	✅ LiveKit real-time	Cloud	`ELEVEN_API_KEY`
OpenAI	✅ (via compatible API)	✅ Whisper, gpt-4o-mini-transcribe	✅ LiveKit real-time	Cloud	`OPENAI_API_KEY`
Smallest AI	✅ Waves (lightning, lightning-large)	✅ Pulse (32+ languages)	✅ LiveKit real-time	Cloud	`SMALLEST_API_KEY`
Kokoro	✅ OpenAI-compatible local server	❌	✅ LiveKit real-time	Local (server)	None (self-hosted)
Silero	✅ silero_tts (torch.hub)	✅ silero_stt (torch.hub)	✅ LiveKit real-time	Local (in-process)	None
Hugging Face	✅ SpeechT5 and others	❌ (removed; use Silero or OpenAI)	✅ LiveKit real-time	Local (in-process)	`HF_TOKEN` (optional)
Whisper (local server)	❌	✅ via OpenAI-compatible API	✅ LiveKit real-time	Local (server)	None (self-hosted)
DeepSeek	❌	❌	❌ (LLM only)	Cloud	`DEEPSEEK_API_KEY`
Google	❌	❌	❌ (LLM only)	Cloud	`GOOGLE_API_KEY`
Anthropic	❌	❌	❌ (LLM only)	Cloud	`ANTHROPIC_API_KEY`

VOD = Voice-on-Demand / live voice interaction via LiveKit. All TTS providers support real-time voice since audio is streamed through LiveKit rooms. Providers marked ❌ for VOD are LLM-only and do not provide speech services.

VAD in agent JSON

vad configures Voice Activity Detection (when the user is speaking). It supports provider and model; optionally onnx_file_path for a custom ONNX file when using Silero.

"vad": { "provider": "silero", "model": "silero_vad" }

provider: "silero" (default) or "huggingface". Silero is used for all agents today; when provider is "huggingface", the config is in place for a future HF VAD plugin.
model: Identifier for the VAD model. For Silero, use "silero_vad"—this is the bundled ONNX model (silero_vad.onnx) from livekit-plugins-silero (snakers4/silero-vad). If omitted, the same default is used.
onnx_file_path (optional): Path to a custom Silero VAD ONNX file; if set, this file is loaded instead of the bundled model.

Chrystèle uses "vad": { "provider": "silero", "model": "silero_vad" }. Léa can use "provider": "huggingface" for future HF VAD alignment.

Expressive TTS tags (LiveKit / Rime)

To make agents sound livelier, use expressive tags in personality_prompt and intro_phrase. The full tag list is injected into the prompt passed to the model based on tts.provider: Rime/Silero/Kokoro use angle brackets (<laugh>, <sigh>, <whis>...</whis>); ElevenLabs v3 uses square brackets ([laughs], [sighs], [whispers]). See docs/LIVEKIT_TTS_TAGS.md for the full list and usage.

Edit agent_configs.py to:
- Add new personas (copy the "celeste" config and modify).
- Change TTS speed ("speed_alpha"), model, or speaker.
- Update the llm_prompt for different conversational styles.
- Adjust intro_phrase for custom greetings.

Example:

"celeste": {
    "tts_options": {
        "model": "arcana",
        "speaker": "celeste",
        "speed_alpha": 1.0,  # 1.0 = normal speed
        ...
    },
    "llm_prompt": "...",
    "intro_phrase": "hey cutie... <laugh> I was just thinking about you. what are you up to?",
}

Lower "speed_alpha" if TTS is too fast for avatar sync.

Technical Notes

Dependencies:
- Uses a forked version of livekit-plugins-rime for improved Arcana support (see requirements.txt).
- All audio processing, TTS, and LLM calls are asynchronous for low latency.
Plugins:
- Noise cancellation (livekit-plugins-noise-cancellation)
- Turn detection (livekit-plugins-turn-detector)
Extensibility:
- Add new plugins, voices, or logic by extending the agent/session classes.
Microphone Selection:
- By default, uses the system default input device.
- To change, modify the code to set the desired device index using sounddevice or relevant library.

Demo/Deployment Tips

For Demos:
- Highlight real-time, character-driven voice interaction.
- Show expressive TTS and persona switching.
- Demonstrate easy customization via agent_configs.py.
- Explain integration with LiveKit for scalable audio rooms.
For Production:
- Deploy on a cloud VM or service (e.g., Render, AWS, Azure).
- Use secure storage for API keys.
- Monitor usage and quotas for OpenAI and Rime.ai.
- Optionally, connect a web or mobile frontend via LiveKit APIs.

Troubleshooting

Quota Errors:
- If you see insufficient_quota or 429 errors, check your OpenAI or Rime.ai account usage and billing.
Audio Sync Issues:
- If TTS audio is faster than the avatar, lower "speed_alpha" in agent_configs.py.
Missing Dependencies:
- Re-run pip install -r requirements.txt in your activated virtual environment.
Microphone Issues:
- Ensure your preferred input device is set as default, or modify the code to select a specific device.

References

This document provides a comprehensive technical overview and setup guide for the Rime LiveKit Agents project. For further details, see the codebase and README.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.galatea		.galatea
agent_template		agent_template
docs		docs
engineering_notebook		engineering_notebook
galatea_livekit.egg-info		galatea_livekit.egg-info
galatea_livekit		galatea_livekit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inflection_llm.py		inflection_llm.py
model_media.csv		model_media.csv
models.csv		models.csv
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_keys.py		test_keys.py
text_utils.py		text_utils.py
token_server.py		token_server.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Rime LiveKit Agents – Technical Overview

Project Summary

Table of Contents

Folder Structure

Key Components

1. rime_agent.py

2. agent_configs.py

3. text_utils.py

Core Technologies

Setup & Installation

1. Clone the Repository

2. Create and Activate a Virtual Environment

3. Install Python Dependencies

4. Download Model Files (Hugging Face / Léa)

Environment Variables & API Keys

Running the Agent

1. Console Mode (Debugging)

2. LiveKit Mode (Production/Demo)

3. Stopping the Agent

Customization & Prompt Engineering

Prompt format (agent_template JSON)

TTS and STT in agent JSON

Provider Comparison Table

VAD in agent JSON

Expressive TTS tags (LiveKit / Rime)

Technical Notes

Demo/Deployment Tips

Troubleshooting

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `rime_agent.py`

2. `agent_configs.py`

3. `text_utils.py`

Packages