Voice AI Customer Service

An experimental project exploring how voice models combined with fast, advanced LLMs can replace traditional customer service. The goal is to create AI agents that understand context better and answer customer questions more accurately than human representatives.

Features

LLM Understanding: Google Gemini 3 Flash (gemini-3-flash-preview) processes customer queries with context awareness
Text-to-Speech: OpenAI gpt-4o-mini-tts or ElevenLabs converts responses to natural speech
Customer Service Mode: --gemini-query flag pipes input through LLM before voice synthesis
Multi-format Input: Supports Markdown, TXT, PDF, and DOCX files

Setup

Install dependencies:
```
python3 -m pip install -r requirements.txt
```
On Linux you may need python3-tk for the GUI picker.
Copy and configure environment:
```
cp .env.local.example .env.local
```
Edit .env.local with your API keys:
- GEMINI_API_KEY - For LLM understanding (Get key)
- OPENAI_API_KEY - For OpenAI TTS
- ELEVENLABS_API - For ElevenLabs TTS

Usage

Customer Service Mode (LLM + Voice)

Process a customer query through Gemini, then convert the response to speech:

python3 tts.py --gemini-query --text "How do I reset my password?"

Standard Text-to-Speech

python3 tts.py --text "Your order has been shipped" --output notification.mp3

From File

python3 tts.py --input-file response.txt --output ./dist/response.wav --format wav

Test All Connections

python3 tts.py --test

Command-Line Options

Option	Description
`--text`	Text string to convert to speech
`--input-file`	Path to a text file (MD/TXT/PDF/DOCX)
`--output`	Output file path (defaults to `~/Downloads/tts-output.mp3`)
`--format`	Audio format: `mp3` or `wav` (default: `mp3`)
`--provider`	TTS provider: `openai` or `elevenlabs`
`--gemini-query`	Process input through Gemini LLM first (customer service mode)
`--gemini-api-key`	Provide Gemini API key directly
`--voice`	Override the default voice
`--model`	Override the model name
`--api-key`	Provide TTS API key directly
`--project`	Project ID for OpenAI or ElevenLabs
`--choose-file`	Open GUI file picker
`--chunk-size`	Override chunking threshold (default: 3400 chars)
`--test`	Test API connections

Environment Variables

Gemini (LLM Understanding)

Variable	Description
`GEMINI_API_KEY`	Google AI API key
`GEMINI_MODEL`	Model ID (default: `gemini-3-flash-preview`)

OpenAI (TTS)

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (required for OpenAI TTS)
`OPENAI_MODEL`	TTS model (default: `gpt-4o-mini-tts`)
`OPENAI_VOICE`	Voice ID (default: `alloy`)
`OPENAI_PROJECT`	Project ID for project-scoped keys

ElevenLabs (TTS)

Variable	Description
`ELEVENLABS_API`	ElevenLabs API key
`ELEVENLABS_MODEL`	Model (default: `eleven_multilingual_v2`)
`ELEVENLABS_VOICE`	Voice ID
`ELEVENLABS_STABILITY`	Voice stability (0-1)
`ELEVENLABS_SIMILARITY`	Similarity boost (0-1)
`ELEVENLABS_STYLE`	Style (0-1)
`ELEVENLABS_SPEAKER_BOOST`	Speaker boost (`true`/`false`)

How It Works

Input: Customer query via text, file, or GUI picker
LLM Processing (optional): Gemini 3 Flash analyzes the query and generates a helpful response
Voice Synthesis: OpenAI or ElevenLabs converts the response to natural speech
Output: MP3/WAV audio file ready for playback

The system uses a customer service-optimized prompt that instructs the LLM to be helpful, accurate, and empathetic. The TTS voice is configured for warm, clear delivery suitable for customer interactions.

Supported Input Formats

*.md / *.txt - Read as UTF-8 (Markdown headings stripped for clean narration)
*.pdf - Requires PyPDF2
*.docx - Requires python-docx

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.env.local.example		.env.local.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
tts.py		tts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice AI Customer Service

Features

Setup

Usage

Customer Service Mode (LLM + Voice)

Standard Text-to-Speech

From File

Test All Connections

Command-Line Options

Environment Variables

Gemini (LLM Understanding)

OpenAI (TTS)

ElevenLabs (TTS)

How It Works

Supported Input Formats

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice AI Customer Service

Features

Setup

Usage

Customer Service Mode (LLM + Voice)

Standard Text-to-Speech

From File

Test All Connections

Command-Line Options

Environment Variables

Gemini (LLM Understanding)

OpenAI (TTS)

ElevenLabs (TTS)

How It Works

Supported Input Formats

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages