click any column header to sort

TTS Bench — Samples — windows-cloning

Rig: windows-5090 — AMD Ryzen 9 9950X3D 16-Core Processor (16C) · NVIDIA GeForce RTX 5090 32GB · 126 GB RAM · Windows 11
Label: cloning · chris_hemsworth_15s — ref chris_hemsworth_15s.wav
5 prompt(s) · one section per prompt · all models ranked by warm TTFA (fastest first) within each
Each prompt section shows every model's audio output, ordered by warm TTFA (fastest first). Click any audio player to hear that model's rendering.

Reference voice

Each model below was given this clip + transcript as the voice to imitate. Source: chris_hemsworth_15s.wav

Prompt 1

[en]"Open the browser and read my email."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 63ms
2 NeuTTS Nano cuda 290ms
3 NeuTTS Nano cpu 318ms
4 NeuTTS Air cuda 428ms
5 NeuTTS Air cpu 475ms
6 Qwen3-TTS 1.7B (CUDA-graph) cuda 542ms
7 Coqui XTTS-v2 cuda 563ms
8 Chatterbox Turbo cuda 632ms
9 F5-TTS cuda 728ms
10 OmniVoice cuda 736ms
11 ZipVoice 123M cuda 1.08s
12 Chatterbox cuda 1.25s
13 Dia 1.6B cuda 1.30s
14 VoxCPM2 2B cuda 1.81s
15 IndexTTS-2 cpu 2.09s
16 MOSS-TTS-Nano cuda 2.28s
17 Coqui XTTS-v2 cpu 2.53s
18 IndexTTS-2 cuda 2.57s
19 MOSS-TTS-Nano cpu 2.61s
20 Qwen3-TTS 1.7B cuda 2.79s
21 MOSS-TTS cuda 2.99s
22 Chatterbox Turbo cpu 3.48s
23 Sesame CSM-1B cuda 3.97s
24 VoxCPM2 2B cpu 6.39s
25 Chatterbox cpu 6.52s
26 ZipVoice 123M cpu 7.67s
27 Qwen3-TTS 1.7B cpu 11.80s
28 Sesame CSM-1B cpu 17.17s
29 Mars5-TTS cuda 29.79s
30 Mars5-TTS cpu 30.56s
31 OmniVoice cpu 37.89s
32 F5-TTS cpu 50.19s

Prompt 2

[en]"I'll start a new git branch, push the changes, and open a pull request when the tests pass."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 86ms
2 NeuTTS Nano cuda 294ms
3 NeuTTS Nano cpu 333ms
4 NeuTTS Air cuda 456ms
5 NeuTTS Air cpu 480ms
6 OmniVoice cuda 759ms
7 F5-TTS cuda 914ms
8 Coqui XTTS-v2 cuda 1.09s
9 Chatterbox Turbo cuda 1.13s
10 Chatterbox cuda 2.21s
11 MOSS-TTS-Nano cuda 3.38s
12 VoxCPM2 2B cuda 3.43s
13 MOSS-TTS cuda 3.52s
14 MOSS-TTS-Nano cpu 3.97s
15 IndexTTS-2 cpu 4.01s
16 IndexTTS-2 cuda 4.12s
17 Coqui XTTS-v2 cpu 6.01s
18 Chatterbox Turbo cpu 6.28s
19 Dia 1.6B cuda 8.34s
20 Sesame CSM-1B cuda 9.01s
21 Chatterbox cpu 11.99s
22 ZipVoice 123M cpu 13.40s
23 VoxCPM2 2B cpu 14.44s
24 Mars5-TTS cuda 33.89s
25 Qwen3-TTS 1.7B (CUDA-graph) cuda 36.45s
26 Sesame CSM-1B cpu 38.15s
27 Mars5-TTS cpu 43.83s
28 OmniVoice cpu 48.65s
29 F5-TTS cpu 64.77s
30 ZipVoice 123M cuda 133.84s

Prompt 3

[en]"The Parakeet TDT zero point six billion parameter model achieves one point six nine percent word error rate on LibriSpeech test-clean, beating Whisper Large V3 at two point seven percent while running at over two thousand times realtime on a single GPU."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 105ms
2 NeuTTS Nano cuda 286ms
3 NeuTTS Nano cpu 337ms
4 NeuTTS Air cuda 432ms
5 NeuTTS Air cpu 478ms
6 OmniVoice cuda 934ms
7 F5-TTS cuda 1.71s
8 Chatterbox Turbo cuda 3.27s
9 Coqui XTTS-v2 cuda 3.35s
10 Chatterbox cuda 5.20s
11 MOSS-TTS cuda 8.96s
12 IndexTTS-2 cpu 9.05s
13 MOSS-TTS-Nano cuda 9.17s
14 VoxCPM2 2B cuda 10.29s
15 MOSS-TTS-Nano cpu 10.40s
16 IndexTTS-2 cuda 10.75s
17 Sesame CSM-1B cuda 17.44s
18 Chatterbox Turbo cpu 20.05s
19 Coqui XTTS-v2 cpu 20.33s
20 Dia 1.6B cuda 24.69s
21 Chatterbox cpu 34.74s
22 VoxCPM2 2B cpu 35.11s
23 Qwen3-TTS 1.7B (CUDA-graph) cuda 36.41s
24 Mars5-TTS cpu 54.81s
25 Mars5-TTS cuda 55.86s
26 Sesame CSM-1B cpu 77.75s
27 OmniVoice cpu 78.95s
28 F5-TTS cpu 103.89s
29 ZipVoice 123M cpu 183.07s

Prompt 4

[en]"Run pytest tests slash test underscore voice dot py with verbose flag and capture flag set to no."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 95ms
2 NeuTTS Nano cuda 294ms
3 NeuTTS Nano cpu 339ms
4 NeuTTS Air cuda 436ms
5 NeuTTS Air cpu 462ms
6 OmniVoice cuda 752ms
7 F5-TTS cuda 870ms
8 Chatterbox Turbo cuda 1.32s
9 Coqui XTTS-v2 cuda 1.59s
10 Chatterbox cuda 2.87s
11 MOSS-TTS-Nano cuda 3.69s
12 IndexTTS-2 cpu 4.32s
13 MOSS-TTS-Nano cpu 4.58s
14 MOSS-TTS cuda 4.69s
15 IndexTTS-2 cuda 5.19s
16 VoxCPM2 2B cuda 5.19s
17 Chatterbox Turbo cpu 7.65s
18 Coqui XTTS-v2 cpu 9.36s
19 Dia 1.6B cuda 11.47s
20 VoxCPM2 2B cpu 14.38s
21 Sesame CSM-1B cuda 15.42s
22 Chatterbox cpu 16.32s
23 ZipVoice 123M cpu 16.61s
24 Qwen3-TTS 1.7B (CUDA-graph) cuda 36.45s
25 Mars5-TTS cuda 38.59s
26 Mars5-TTS cpu 43.71s
27 OmniVoice cpu 49.71s
28 Sesame CSM-1B cpu 61.47s
29 F5-TTS cpu 65.87s
30 ZipVoice 123M cuda 142.27s

Prompt 5

[fr]"Bonjour, je m'appelle Cicero et je vais vous aider avec votre code aujourd'hui."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 273ms
2 NeuTTS Nano cuda 361ms
3 NeuTTS Nano cpu 364ms
4 OmniVoice cuda 833ms
5 Coqui XTTS-v2 cuda 953ms
6 VoxCPM2 2B cuda 2.87s
7 MOSS-TTS cuda 3.52s
8 MOSS-TTS-Nano cpu 3.61s
9 Coqui XTTS-v2 cpu 5.58s
10 MOSS-TTS-Nano cuda 5.84s
11 VoxCPM2 2B cpu 9.90s
12 ZipVoice 123M cpu 14.56s
13 Qwen3-TTS 1.7B (CUDA-graph) cuda 19.30s
14 Qwen3-TTS 1.7B cpu 24.84s
15 OmniVoice cpu 51.20s
16 ZipVoice 123M cuda 146.78s