click any column header to sort

TTS Bench — Speed — linux-default

Rig: linux-3090 — AMD Ryzen 9 5900XT 16-Core Processor · NVIDIA GeForce RTX 3090 24GB · 63 GB RAM · Linux 6.8.0-117-generic
Label: default voice
TTFA = time to first audio (ms; lower is better). RTF = real-time factor (× realtime; higher is better; e.g. 10× means 10 sec of audio generated per 1 sec of compute). Cold = first run after process start; warm = subsequent runs.

Speed winners

Fastest predefined-voice: Kokoro (cuda) — 97.68× warm RTF, 73ms warm TTFA

Fastest cloning-capable: OmniVoice (cuda) — 8.35× warm RTF, 759ms warm TTFA

Model Device TTFA cold TTFA warm RTF cold RTF warm Peak RAM Peak VRAM Size
Kokorocuda 491ms 73ms 15.15× 97.68× 1.85 GB 879 MB 82M
Pipercpu 221ms 180ms 28.71× 37.14× 477 MB ~25MB
LuxTTScuda 391ms 215ms 13.18× 24.98× 2.39 GB 987 MB
OmniVoicecuda 1.31s 759ms 4.95× 8.35× 1.99 GB 2.11 GB ~1B
Kokorocpu 1.37s 1.22s 5.80× 6.96× 1.65 GB 82M
Supertoniccpu 1.28s 1.28s 5.87× 5.89× 612 MB 99M
Coqui XTTS-v2cuda 1.95s 1.76s 4.19× 4.88× 2.08 GB 2.12 GB 750M
Soprano 80Mcuda 1.37s 1.39s 4.87× 4.86× 1.85 GB 325 MB 80M
Chatterbox Turbocuda 1.88s 1.54s 3.68× 4.66× 2.12 GB 3.01 GB 744M
Pocket-TTScpu 163ms 148ms 4.07× 4.05× 1.75 GB 100M
KittenTTScpu 2.39s 2.07s 3.29× 3.72× 330 MB <100M
Qwen3-TTS 1.7B (CUDA-graph)cuda 9.45s 2.01s 0.70× 3.04× 2.38 GB 4.89 GB 1.7B
F5-TTScuda 2.01s 1.43s 2.22× 3.03× 2.27 GB 802 MB 330M
Soprano 80Mcpu 2.17s 2.23s 3.13× 3.01× 1.33 GB 80M
NeuTTS Nanocuda 859ms 311ms 2.26× 3.00× 3.45 GB 3.25 GB 748M
VibeVoice Realtime 0.5Bcuda 3.60s 3.28s 2.45× 2.76× 1.78 GB 2.62 GB 0.5B
Chatterboxcuda 3.10s 2.46s 1.92× 2.28× 2.07 GB 3.26 GB 1.2B
VoxCPM2 2Bcuda 3.27s 3.44s 2.11× 2.10× 3.39 GB 5.56 GB 2B
MOSS-TTS-Nanocuda 4.95s 3.56s 1.67× 2.08× 1.93 GB 971 MB 100M
NeuTTS Nanocpu 925ms 389ms 1.67× 2.03× 5.61 GB 748M
Magpie-TTScuda 5.77s 4.40s 1.57× 2.02× 2.68 GB 6.56 GB 357M
VibeVoice 1.5Bcuda 4.17s 4.69s 1.53× 1.83× 1.97 GB 5.26 GB 3B
LuxTTScpu 1.94s 1.90s 1.68× 1.71× 2.30 GB
NeuTTS Aircuda 1.59s 478ms 1.29× 1.68× 3.88 GB 3.25 GB 748M
NeuTTS Aircpu 1.60s 554ms 1.11× 1.35× 6.01 GB 748M
IndexTTS-2cpu 6.72s 5.67s 0.97× 1.20× 5.67 GB 1.5B
MOSS-TTS-Nanocpu 6.91s 6.09s 1.07× 1.17× 1.30 GB 100M
IndexTTS-2cuda 7.25s 5.95s 0.91× 1.08× 2.77 GB 7.57 GB 1.5B
Qwen3-TTS 1.7Bcuda 8.19s 7.01s 0.74× 0.88× 2.21 GB 4.64 GB 1.7B
Coqui XTTS-v2cpu 11.98s 11.11s 0.79× 0.80× 3.15 GB 750M
Sesame CSM-1Bcuda 9.30s 8.66s 0.71× 0.76× 2.08 GB 3.51 GB 1B
Chatterbox Turbocpu 11.03s 10.58s 0.66× 0.69× 3.87 GB 744M
Dia 1.6Bcuda 16.31s 13.83s 0.49× 0.61× 2.19 GB 4.58 GB 1.6B
VibeVoice Realtime 0.5Bcpu 17.64s 15.86s 0.57× 0.55× 6.51 GB 0.5B
ZipVoice 123M (4/5 ok)cpu 15.01s 12.19s 0.43× 0.53× 53.97 GB 123M
OmniVoicecpu 15.83s 15.35s 0.40× 0.41× 2.99 GB ~1B
Chatterboxcpu 17.21s 16.53s 0.36× 0.37× 4.16 GB 1.2B
Magpie-TTScpu 40.47s 39.78s 0.25× 0.26× 13.55 GB 357M
VoxCPM2 2Bcpu 29.43s 29.16s 0.24× 0.24× 6.44 GB 2B
VibeVoice 1.5Bcpu 43.47s 52.25s 0.17× 0.17× 11.26 GB 3B
Qwen3-TTS 1.7Bcpu 38.52s 35.33s 0.15× 0.17× 9.12 GB 1.7B
Sesame CSM-1Bcpu 52.98s 57.59s 0.12× 0.12× 5.95 GB 1B
Mars5-TTScuda 60.14s 58.72s 0.13× 0.12× 2.60 GB 6.80 GB 1.2B
Mars5-TTScpu 59.06s 58.01s 0.12× 0.12× 2.64 GB 1.2B
F5-TTScpu 53.94s 55.46s 0.08× 0.08× 2.58 GB 330M
ZipVoice 123McudaSkipped — out of GPU memory (model exceeds this GPU's VRAM)