Fastest (this cloning run): Pocket-TTS (cpu) — 8.01× warm RTF, 44ms warm TTFA
| Model | Device | TTFA cold | TTFA warm | RTF cold | RTF warm | Peak RAM | Peak VRAM | Size |
|---|---|---|---|---|---|---|---|---|
| Pocket-TTS | cpu | 46ms | 44ms | 7.91× | 8.01× | 2.46 GB | — | 100M |
| NeuTTS Nano | cpu | 785ms | 281ms | 2.19× | 3.01× | 5.83 GB | — | 748M |
| NeuTTS Nano | mps | 1.54s | 493ms | 1.78× | 2.87× | 1.38 GB | — | 748M |
| NeuTTS Air | cpu | 1.50s | 359ms | 1.57× | 2.14× | 4.96 GB | — | 748M |
| NeuTTS Air | mps | 2.35s | 562ms | 1.26× | 2.10× | 1.56 GB | — | 748M |
| Coqui XTTS-v2 | mps | 12.06s | 6.80s | 1.04× | 1.59× | 1.87 GB | — | 750M |
| Coqui XTTS-v2 | cpu | 5.67s | 6.73s | 1.35× | 1.36× | 4.37 GB | — | 750M |
| Chatterbox Turbo | mps | 8.12s | 14.00s | 0.80× | 1.10× | 2.57 GB | — | 744M |
| Chatterbox Turbo | cpu | 6.62s | 6.39s | 0.98× | 1.03× | 4.92 GB | — | 744M |
| Chatterbox | mps | 36.45s | 34.12s | 0.25× | 0.33× | 1000 MB | — | 1.2B |
| Chatterbox | cpu | 24.68s | 22.69s | 0.30× | 0.32× | 5.55 GB | — | 1.2B |
| Qwen3-TTS 1.7B | cpu | 26.94s | 22.01s | 0.22× | 0.26× | 3.66 GB | — | 1.7B |
| Sesame CSM-1B | cpu | 32.32s | 23.57s | 0.18× | 0.24× | 6.44 GB | — | 1B |
| OmniVoice | cpu | 36.62s | 29.89s | 0.15× | 0.18× | 4.15 GB | — | ~1B |
| IndexTTS-2 | cpu | 79.63s | 44.94s | 0.08× | 0.15× | 266 MB | — | 1.5B |
| ZipVoice 123M (4/5 ok) | cpu | 33.60s | 37.61s | 0.13× | 0.10× | 2.13 GB | — | 123M |
| F5-TTS | mps | 44.88s | 45.26s | 0.10× | 0.09× | 1.29 GB | — | 330M |
| F5-TTS | cpu | 47.24s | 47.06s | 0.09× | 0.09× | 2.32 GB | — | 330M |
| LuxTTS | cpu | LuxTTS install failed (piper-phonemize has no Windows wheels) | ||||||
| LuxTTS | mps | LuxTTS install failed (piper-phonemize has no Windows wheels) | ||||||
| OmniVoice (1/5 ok) | mps | 26.02s | — | 0.18× | — | 515 MB | — | ~1B |
| VoxCPM2 2B | cpu | Timed out after 10 min — model too slow at this prompt length | ||||||
| Mars5-TTS | cpu | Timed out after 10 min — model too slow at this prompt length | ||||||
| ZipVoice 123M | mps | Skipped — out of memory (model exceeds available RAM) | ||||||