click any column header to sort

TTS Bench — Speed — mac-cloning

Rig: mac-m4 — Apple M4 (10C) · Apple M4 GPU (MPS) · 16 GB RAM · Darwin 25.5.0
Label: cloning · jo — ref jo.wav
TTFA = time to first audio (ms; lower is better). RTF = real-time factor (× realtime; higher is better; e.g. 10× means 10 sec of audio generated per 1 sec of compute). Cold = first run after process start; warm = subsequent runs.

Speed winners

Fastest (this cloning run): Pocket-TTS (cpu) — 8.01× warm RTF, 44ms warm TTFA

Model Device TTFA cold TTFA warm RTF cold RTF warm Peak RAM Peak VRAM Size
Pocket-TTScpu 46ms 44ms 7.91× 8.01× 2.46 GB 100M
NeuTTS Nanocpu 785ms 281ms 2.19× 3.01× 5.83 GB 748M
NeuTTS Nanomps 1.54s 493ms 1.78× 2.87× 1.38 GB 748M
NeuTTS Aircpu 1.50s 359ms 1.57× 2.14× 4.96 GB 748M
NeuTTS Airmps 2.35s 562ms 1.26× 2.10× 1.56 GB 748M
Coqui XTTS-v2mps 12.06s 6.80s 1.04× 1.59× 1.87 GB 750M
Coqui XTTS-v2cpu 5.67s 6.73s 1.35× 1.36× 4.37 GB 750M
Chatterbox Turbomps 8.12s 14.00s 0.80× 1.10× 2.57 GB 744M
Chatterbox Turbocpu 6.62s 6.39s 0.98× 1.03× 4.92 GB 744M
Chatterboxmps 36.45s 34.12s 0.25× 0.33× 1000 MB 1.2B
Chatterboxcpu 24.68s 22.69s 0.30× 0.32× 5.55 GB 1.2B
Qwen3-TTS 1.7Bcpu 26.94s 22.01s 0.22× 0.26× 3.66 GB 1.7B
Sesame CSM-1Bcpu 32.32s 23.57s 0.18× 0.24× 6.44 GB 1B
OmniVoicecpu 36.62s 29.89s 0.15× 0.18× 4.15 GB ~1B
IndexTTS-2cpu 79.63s 44.94s 0.08× 0.15× 266 MB 1.5B
ZipVoice 123M (4/5 ok)cpu 33.60s 37.61s 0.13× 0.10× 2.13 GB 123M
F5-TTSmps 44.88s 45.26s 0.10× 0.09× 1.29 GB 330M
F5-TTScpu 47.24s 47.06s 0.09× 0.09× 2.32 GB 330M
LuxTTScpuLuxTTS install failed (piper-phonemize has no Windows wheels)
LuxTTSmpsLuxTTS install failed (piper-phonemize has no Windows wheels)
OmniVoice (1/5 ok)mps 26.02s 0.18× 515 MB ~1B
VoxCPM2 2BcpuTimed out after 10 min — model too slow at this prompt length
Mars5-TTScpuTimed out after 10 min — model too slow at this prompt length
ZipVoice 123MmpsSkipped — out of memory (model exceeds available RAM)