Fastest predefined-voice: Piper (cpu) — 32.14× warm RTF, 208ms warm TTFA
Fastest cloning-capable: Pocket-TTS (cpu) — 7.82× warm RTF, 44ms warm TTFA
| Model | Device | TTFA cold | TTFA warm | RTF cold | RTF warm | Peak RAM | Peak VRAM | Size |
|---|---|---|---|---|---|---|---|---|
| Piper | cpu | 228ms | 208ms | 27.47× | 32.14× | 561 MB | — | ~25MB |
| Kokoro | mps | 920ms | 489ms | 7.58× | 15.28× | 1.26 GB | — | 82M |
| Kokoro | cpu | 914ms | 749ms | 7.78× | 10.04× | 2.49 GB | — | 82M |
| KittenTTS | cpu | 994ms | 1.01s | 8.10× | 8.08× | 385 MB | — | <100M |
| Pocket-TTS | cpu | 89ms | 44ms | 7.50× | 7.82× | 1.69 GB | — | 100M |
| Soprano 80M | cpu | 912ms | 906ms | 7.15× | 7.15× | 1.18 GB | — | 80M |
| Soprano 80M | mps | 2.48s | 1.30s | 2.76× | 5.16× | 1.64 GB | — | 80M |
| Supertonic | cpu | 1.86s | 1.57s | 4.18× | 4.81× | 968 MB | — | 99M |
| NeuTTS Nano | cpu | 740ms | 266ms | 2.30× | 3.10× | 5.83 GB | — | 748M |
| NeuTTS Nano | mps | 1.45s | 464ms | 1.81× | 2.96× | 1.28 GB | — | 748M |
| NeuTTS Air | cpu | 1.38s | 358ms | 1.55× | 2.16× | 6.12 GB | — | 748M |
| NeuTTS Air | mps | 2.32s | 583ms | 1.30× | 2.08× | 1.55 GB | — | 748M |
| Coqui XTTS-v2 | mps | 17.81s | 7.82s | 1.14× | 1.70× | 1.84 GB | — | 750M |
| Chatterbox Turbo | mps | 9.95s | 5.07s | 0.75× | 1.44× | 2.30 GB | — | 744M |
| Coqui XTTS-v2 | cpu | 6.01s | 5.48s | 1.41× | 1.43× | 4.61 GB | — | 750M |
| Chatterbox Turbo | cpu | 6.53s | 6.45s | 1.07× | 1.14× | 4.80 GB | — | 744M |
| VibeVoice Realtime 0.5B | mps | 9.56s | 8.30s | 0.98× | 1.13× | 838 MB | — | 0.5B |
| OmniVoice (4/5 ok) | mps | 5.26s | 5.06s | 0.83× | 0.90× | 981 MB | — | ~1B |
| OmniVoice | cpu | 13.29s | 11.23s | 0.47× | 0.57× | 3.59 GB | — | ~1B |
| Magpie-TTS | cpu | 25.55s | 26.19s | 0.40× | 0.40× | 4.69 GB | — | 357M |
| Chatterbox | cpu | 18.77s | 17.08s | 0.33× | 0.37× | 4.83 GB | — | 1.2B |
| VibeVoice Realtime 0.5B | cpu | 25.45s | 25.10s | 0.38× | 0.36× | 3.17 GB | — | 0.5B |
| Chatterbox | mps | 25.95s | 31.35s | 0.27× | 0.29× | 698 MB | — | 1.2B |
| Qwen3-TTS 1.7B | cpu | 24.52s | 25.11s | 0.25× | 0.24× | 3.93 GB | — | 1.7B |
| Sesame CSM-1B | cpu | 38.67s | 31.30s | 0.19× | 0.24× | 6.08 GB | — | 1B |
| VoxCPM2 2B | cpu | 28.36s | 52.13s | 0.29× | 0.17× | 144 MB | — | 2B |
| IndexTTS-2 | cpu | 80.00s | 48.40s | 0.08× | 0.14× | 318 MB | — | 1.5B |
| F5-TTS | mps | 45.43s | 45.54s | 0.09× | 0.09× | 1.29 GB | — | 330M |
| F5-TTS | cpu | 47.26s | 47.15s | 0.09× | 0.09× | 3.31 GB | — | 330M |
| LuxTTS | cpu | LuxTTS install failed (piper-phonemize has no Windows wheels) | ||||||
| LuxTTS | mps | LuxTTS install failed (piper-phonemize has no Windows wheels) | ||||||
| ZipVoice 123M | cpu | Reference wav missing for voice cloning | ||||||
| ZipVoice 123M | mps | Reference wav missing for voice cloning | ||||||
| Mars5-TTS | cpu | Timed out after 10 min — model too slow at this prompt length | ||||||
| VibeVoice 1.5B | cpu | Reference wav missing for voice cloning | ||||||
| VibeVoice 1.5B | mps | Skipped — out of GPU memory (model exceeds this device's memory) | ||||||