Most TTS services run you into a usage wall. ElevenLabs, Murf, Play.ht — they all cap you monthly and the good voices cost money. Index-TTS on a rented GPU gives you the same quality, no cap, and you pay only while the pod is running.
What is Index-TTS?
Index-TTS is an open-source text-to-speech model. Fast inference, clean voice quality, and a simple web UI. You upload an audio sample as a voice reference, type your text, and it generates speech in that voice. No cloud subscription required.
The catch: it needs a GPU to run at any decent speed. A CPU will work, but slowly. RunPod lets you rent a GPU by the hour, which makes this cost-effective for anything from short batches to full production workloads.
What you need
- A RunPod account with some credit loaded (a few dollars)
- A HuggingFace account and access token
- A short audio clip to use as a voice reference
Step 1: Rent the GPU pod
Log into RunPod and go to Pods. Select Community Cloud, then choose the RTX A5000 (tested, works well for Index-TTS).
Before deploying, click Edit on the pod template and open the Ports section. Add port 7860 — that's the port the Index-TTS web UI runs on. Disable Jupyter Notebook since you won't need it.
Click Deploy on Demand. The pod starts in 2-3 minutes.
Step 2: Connect via SSH
Once the pod shows as running, copy the SSH command (labeled "SSH over Exposed TCP") from the RunPod dashboard. Paste it into your terminal. Accept the host key. You're in.
Step 3: Install Index-TTS
Run these commands in sequence. The installation script is in the video description if you want to copy it directly.
# Update system packages and install dependencies
apt-get update && apt-get install -y build-essential ffmpeg
# Install HuggingFace CLI
pip install huggingface_hub
# Authenticate with HuggingFace
huggingface-cli login
For the HuggingFace token, go to your HuggingFace account settings, find Access Tokens, and create a new token with read access. Paste it in the terminal prompt. Choose "No" when asked about adding it as a git credential.
# Clone Index-TTS
git clone https://github.com/index-tts/index-tts
# Install Python dependencies
cd index-tts && pip install -r requirements.txt
This install takes a few minutes. Skip ahead to when it finishes.
Step 4: Download models and run the web UI
# Download model checkpoints
python download_models.py
# Disable HF transfer to avoid web UI compatibility issues
export HF_HUB_ENABLE_HF_TRANSFER=0
# Start the web UI
python webui.py
The terminal shows the GPU was detected (RTX A5000 or whichever you rented) and that the server is running on port 7860.
Step 5: Open the web UI
Back in RunPod, go to My Pods. You'll see port 7860 listed under your pod's ports. Click "HTTP Service." The Index-TTS web UI opens in your browser.
Generating your first audio
- Upload an audio file as your voice reference. Any short clip works — 10-30 seconds is fine.
- Type the text you want to synthesize in the text field.
- Click Synthesize.
The model generates the audio in a few seconds. Play it back. The voice matches the reference clip closely, and delivery is calm and natural.
Cost breakdown
You pay only while the pod is running. An RTX A5000 on RunPod Community Cloud runs around $0.22/hour. For most use cases — generating voiceovers for a batch of videos, creating audio content, building a voice bank — you'd spend under $5 total per session.
Compare that to ElevenLabs Pro at $99/month with 192,000 character limits. If you exceed the limit, you pay per character. With this setup, there's no limit.
Terminate the pod when you're done. RunPod charges by uptime. A pod left running overnight costs real money.
If you're building voice generation into a product or automation pipeline and need it production-ready, get in touch. We build custom AI UGC pipelines end to end.