click any column header to sort

TTS Bench — Samples — linux-cloning

Rig: linux-3090 — AMD Ryzen 9 5900XT 16-Core Processor · NVIDIA GeForce RTX 3090 24GB · 63 GB RAM · Linux 6.8.0-117-generic
Label: cloning · chris_hemsworth_15s — ref chris_hemsworth_15s.wav
5 prompt(s) · one section per prompt · all models ranked by warm TTFA (fastest first) within each
Each prompt section shows every model's audio output, ordered by warm TTFA (fastest first). Click any audio player to hear that model's rendering.

Reference voice

Each model below was given this clip + transcript as the voice to imitate. Source: chris_hemsworth_15s.wav

Prompt 1

[en]"Open the browser and read my email."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 81ms
2 LuxTTS cuda 195ms
3 NeuTTS Nano cuda 359ms
4 NeuTTS Nano cpu 418ms
5 NeuTTS Air cuda 432ms
6 Coqui XTTS-v2 cuda 475ms
7 NeuTTS Air cpu 559ms
8 Chatterbox Turbo cuda 631ms
9 Qwen3-TTS 1.7B (CUDA-graph) cuda 767ms
10 Chatterbox cuda 1.13s
11 OmniVoice cuda 1.14s
12 F5-TTS cuda 1.14s
13 VoxCPM2 2B cuda 1.22s
14 LuxTTS cpu 1.29s
15 Dia 1.6B cuda 1.29s
16 MOSS-TTS-Nano cuda 1.97s
17 IndexTTS-2 cpu 2.80s
18 IndexTTS-2 cuda 3.09s
19 Coqui XTTS-v2 cpu 3.15s
20 Sesame CSM-1B cuda 3.34s
21 MOSS-TTS-Nano cpu 3.52s
22 Chatterbox Turbo cpu 3.91s
23 ZipVoice 123M cpu 6.97s
24 Chatterbox cpu 8.01s
25 VoxCPM2 2B cpu 14.19s
26 Sesame CSM-1B cpu 27.72s
27 OmniVoice cpu 35.63s
28 F5-TTS cpu 45.27s
29 Mars5-TTS cpu 68.86s
30 Mars5-TTS cuda 70.40s

Prompt 2

[en]"I'll start a new git branch, push the changes, and open a pull request when the tests pass."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 110ms
2 LuxTTS cuda 219ms
3 NeuTTS Nano cuda 343ms
4 NeuTTS Nano cpu 378ms
5 NeuTTS Air cuda 426ms
6 NeuTTS Air cpu 598ms
7 Coqui XTTS-v2 cuda 998ms
8 Chatterbox Turbo cuda 1.17s
9 OmniVoice cuda 1.35s
10 F5-TTS cuda 1.50s
11 LuxTTS cpu 2.08s
12 Chatterbox cuda 2.21s
13 MOSS-TTS-Nano cuda 2.27s
14 VoxCPM2 2B cuda 2.38s
15 IndexTTS-2 cpu 4.25s
16 IndexTTS-2 cuda 4.62s
17 MOSS-TTS-Nano cpu 5.40s
18 Sesame CSM-1B cuda 6.49s
19 Coqui XTTS-v2 cpu 6.81s
20 Chatterbox Turbo cpu 7.40s
21 Dia 1.6B cuda 7.86s
22 ZipVoice 123M cpu 13.29s
23 Chatterbox cpu 14.36s
24 VoxCPM2 2B cpu 26.35s
25 Sesame CSM-1B cpu 38.09s
26 OmniVoice cpu 44.05s
27 Qwen3-TTS 1.7B (CUDA-graph) cuda 45.49s
28 F5-TTS cpu 57.96s
29 Mars5-TTS cpu 72.55s
30 Mars5-TTS cuda 80.29s

Prompt 3

[en]"The Parakeet TDT zero point six billion parameter model achieves one point six nine percent word error rate on LibriSpeech test-clean, beating Whisper Large V3 at two point seven percent while running at over two thousand times realtime on a single GPU."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 135ms
2 LuxTTS cuda 332ms
3 NeuTTS Nano cuda 373ms
4 NeuTTS Nano cpu 449ms
5 NeuTTS Air cuda 531ms
6 NeuTTS Air cpu 594ms
7 OmniVoice cuda 2.23s
8 F5-TTS cuda 3.14s
9 Chatterbox Turbo cuda 3.32s
10 Coqui XTTS-v2 cuda 3.65s
11 Qwen3-TTS 1.7B (CUDA-graph) cuda 3.85s
12 LuxTTS cpu 5.76s
13 Chatterbox cuda 6.05s
14 MOSS-TTS-Nano cuda 6.81s
15 VoxCPM2 2B cuda 7.42s
16 IndexTTS-2 cpu 11.39s
17 IndexTTS-2 cuda 12.24s
18 Sesame CSM-1B cuda 13.52s
19 MOSS-TTS-Nano cpu 14.54s
20 Chatterbox Turbo cpu 21.26s
21 Dia 1.6B cuda 21.73s
22 Coqui XTTS-v2 cpu 25.42s
23 Chatterbox cpu 42.09s
24 VoxCPM2 2B cpu 67.86s
25 OmniVoice cpu 69.74s
26 Sesame CSM-1B cpu 80.94s
27 Mars5-TTS cuda 104.33s
28 Mars5-TTS cpu 111.83s
29 F5-TTS cpu 141.44s

Prompt 4

[en]"Run pytest tests slash test underscore voice dot py with verbose flag and capture flag set to no."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 108ms
2 LuxTTS cuda 220ms
3 NeuTTS Nano cuda 366ms
4 NeuTTS Nano cpu 428ms
5 NeuTTS Air cuda 523ms
6 NeuTTS Air cpu 597ms
7 Chatterbox Turbo cuda 1.32s
8 OmniVoice cuda 1.40s
9 F5-TTS cuda 1.56s
10 Coqui XTTS-v2 cuda 1.70s
11 LuxTTS cpu 2.24s
12 MOSS-TTS-Nano cuda 2.99s
13 Chatterbox cuda 3.04s
14 VoxCPM2 2B cuda 3.10s
15 IndexTTS-2 cpu 5.00s
16 MOSS-TTS-Nano cpu 5.89s
17 IndexTTS-2 cuda 6.00s
18 Chatterbox Turbo cpu 8.85s
19 Dia 1.6B cuda 8.86s
20 Coqui XTTS-v2 cpu 8.87s
21 Sesame CSM-1B cuda 9.56s
22 ZipVoice 123M cpu 15.77s
23 Chatterbox cpu 18.60s
24 VoxCPM2 2B cpu 37.91s
25 OmniVoice cpu 45.42s
26 Qwen3-TTS 1.7B (CUDA-graph) cuda 45.62s
27 Sesame CSM-1B cpu 58.88s
28 F5-TTS cpu 59.69s
29 Mars5-TTS cpu 90.10s
30 Mars5-TTS cuda 96.21s

Prompt 5

[fr]"Bonjour, je m'appelle Cicero et je vais vous aider avec votre code aujourd'hui."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 320ms
2 NeuTTS Nano cuda 343ms
3 NeuTTS Nano cpu 424ms
4 Coqui XTTS-v2 cuda 808ms
5 OmniVoice cuda 1.33s
6 VoxCPM2 2B cuda 1.37s
7 Qwen3-TTS 1.7B (CUDA-graph) cuda 1.41s
8 MOSS-TTS-Nano cuda 2.06s
9 MOSS-TTS-Nano cpu 3.79s
10 Coqui XTTS-v2 cpu 5.99s
11 ZipVoice 123M cpu 12.85s
12 VoxCPM2 2B cpu 18.72s
13 OmniVoice cpu 42.37s