click any column header to sort

TTS Bench — Speed — windows-default

Rig: windows-5090 — AMD Ryzen 9 9950X3D 16-Core Processor (16C) · NVIDIA GeForce RTX 5090 32GB · 126 GB RAM · Windows 11
Label: default voice
TTFA = time to first audio (ms; lower is better). RTF = real-time factor (× realtime; higher is better; e.g. 10× means 10 sec of audio generated per 1 sec of compute). Cold = first run after process start; warm = subsequent runs.

Speed winners

Fastest predefined-voice: Kokoro (cuda) — 103.68× warm RTF, 67ms warm TTFA

Fastest cloning-capable: OmniVoice (cuda) — 9.30× warm RTF, 757ms warm TTFA

Model Device TTFA cold TTFA warm RTF cold RTF warm Peak RAM Peak VRAM Size
Kokorocuda 895ms 67ms 8.32× 103.68× 2.43 GB 925 MB 82M
Pipercpu 160ms 107ms 38.94× 58.83× 470 MB ~25MB
Kokorocpu 609ms 532ms 12.13× 14.38× 1.82 GB 82M
Supertoniccpu 744ms 741ms 9.86× 9.92× 570 MB 99M
OmniVoicecuda 1.12s 757ms 6.44× 9.30× 2.07 GB 2.16 GB ~1B
KittenTTScpu 1.22s 1.20s 6.40× 6.39× 338 MB <100M
F5-TTScuda 1.31s 845ms 3.47× 5.32× 2.67 GB 802 MB 330M
Coqui XTTS-v2cuda 2.05s 1.87s 3.63× 4.75× 2.10 GB 2.14 GB 750M
Chatterbox Turbocuda 2.39s 1.62s 2.80× 4.28× 2.44 GB 3.01 GB 744M
Pocket-TTScpu 147ms 123ms 3.99× 4.06× 1.95 GB 100M
Soprano 80Mcuda 1.74s 1.77s 3.77× 3.76× 2.12 GB 326 MB 80M
Qwen3-TTS 1.7B (CUDA-graph)cuda 6.51s 1.60s 0.90× 3.76× 2.48 GB 4.89 GB 1.7B
Soprano 80Mcpu 1.97s 2.00s 3.40× 3.40× 1.34 GB 80M
NeuTTS Nanocuda 678ms 258ms 2.19× 2.76× 3.24 GB 3.26 GB 748M
VibeVoice Realtime 0.5Bcuda 3.80s 3.77s 2.24× 2.39× 1.88 GB 2.62 GB 0.5B
Chatterboxcuda 3.35s 2.61s 1.66× 2.24× 2.80 GB 3.24 GB 1.2B
NeuTTS Nanocpu 698ms 303ms 1.73× 2.00× 5.03 GB 748M
Magpie-TTScuda 5.48s 4.48s 1.53× 1.93× 3.54 GB 5.60 GB 357M
NeuTTS Aircuda 1.15s 417ms 1.34× 1.62× 3.60 GB 3.26 GB 748M
MOSS-TTS-Nanocuda 5.83s 4.92s 1.36× 1.61× 2.42 GB 777 MB 100M
VibeVoice 1.5Bcuda 4.99s 5.32s 1.44× 1.61× 2.04 GB 5.26 GB 3B
MOSS-TTScuda 5.23s 4.74s 1.34× 1.48× 2.08 GB 22.83 GB 8B
VoxCPM2 2Bcuda 5.36s 5.19s 1.35× 1.35× 6.18 GB 5.65 GB 2B
IndexTTS-2cpu 6.48s 5.42s 1.09× 1.31× 5.56 GB 1.5B
NeuTTS Aircpu 1.20s 471ms 1.16× 1.29× 5.37 GB 748M
MOSS-TTS-Nanocpu 6.38s 5.64s 1.08× 1.21× 3.14 GB 100M
IndexTTS-2cuda 7.20s 6.03s 0.93× 1.11× 5.88 GB 7.60 GB 1.5B
Coqui XTTS-v2cpu 9.92s 9.73s 0.87× 0.88× 3.23 GB 750M
Chatterbox Turbocpu 10.65s 9.92s 0.70× 0.73× 4.16 GB 744M
Qwen3-TTS 1.7Bcuda 10.77s 8.95s 0.55× 0.70× 2.42 GB 4.64 GB 1.7B
VibeVoice Realtime 0.5Bcpu 14.49s 13.35s 0.63× 0.66× 5.89 GB 0.5B
ZipVoice 123M (4/5 ok)cuda 68.83s 86.51s 0.35× 0.60× 25.87 GB 53.16 GB 123M
Dia 1.6Bcuda 25.42s 22.82s 0.46× 0.55× 4.42 GB 6.32 GB 1.6B
Sesame CSM-1Bcuda 12.01s 12.45s 0.52× 0.54× 2.38 GB 3.51 GB 1B
ZipVoice 123M (3/5 ok)cpu 22.96s 13.44s 0.27× 0.45× 35.45 GB 123M
VoxCPM2 2Bcpu 15.35s 14.85s 0.46× 0.45× 10.14 GB 2B
Chatterboxcpu 14.47s 14.10s 0.40× 0.43× 4.24 GB 1.2B
OmniVoicecpu 16.45s 15.98s 0.38× 0.39× 3.05 GB ~1B
Magpie-TTScpu 37.17s 33.72s 0.29× 0.30× 6.10 GB 357M
Mars5-TTScpu 31.19s 30.69s 0.22× 0.24× 4.03 GB 1.2B
Mars5-TTScuda 31.20s 31.06s 0.23× 0.23× 2.25 GB 6.81 GB 1.2B
VibeVoice 1.5Bcpu 39.59s 45.31s 0.19× 0.20× 11.62 GB 3B
Qwen3-TTS 1.7Bcpu 34.81s 30.07s 0.18× 0.19× 10.40 GB 1.7B
Sesame CSM-1Bcpu 50.90s 57.54s 0.11× 0.12× 5.77 GB 1B
F5-TTScpu 58.77s 60.21s 0.07× 0.07× 2.59 GB 330M
LuxTTScpuLuxTTS install failed (piper-phonemize has no Windows wheels)