click any column header to sort

TTS Bench — Samples — mac-cloning

Rig: mac-m4 — Apple M4 (10C) · Apple M4 GPU (MPS) · 16 GB RAM · Darwin 25.5.0
Label: cloning · jo — ref jo.wav
5 prompt(s) · one section per prompt · all models ranked by warm TTFA (fastest first) within each
Each prompt section shows every model's audio output, ordered by warm TTFA (fastest first). Click any audio player to hear that model's rendering.

Reference voice

Each model below was given this clip + transcript as the voice to imitate. Source: jo.wav

Prompt 1

[en]"Open the browser and read my email."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 29ms
2 NeuTTS Nano cpu 281ms
3 NeuTTS Air cpu 354ms
4 NeuTTS Nano mps 490ms
5 NeuTTS Air mps 545ms
6 Coqui XTTS-v2 mps 1.38s
7 Chatterbox Turbo mps 2.03s
8 Coqui XTTS-v2 cpu 2.11s
9 Chatterbox Turbo cpu 2.63s
10 Chatterbox mps 5.00s
11 Sesame CSM-1B cpu 8.51s
12 Chatterbox cpu 9.24s
13 Qwen3-TTS 1.7B cpu 9.76s
14 OmniVoice cpu 23.79s
15 ZipVoice 123M cpu 24.59s
16 IndexTTS-2 cpu 26.61s
17 F5-TTS mps 30.70s
18 F5-TTS cpu 34.44s

Prompt 2

[en]"I'll start a new git branch, push the changes, and open a pull request when the tests pass."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 31ms
2 NeuTTS Nano cpu 283ms
3 NeuTTS Air cpu 360ms
4 NeuTTS Nano mps 486ms
5 NeuTTS Air mps 564ms
6 Coqui XTTS-v2 mps 2.74s
7 Chatterbox Turbo mps 3.53s
8 Coqui XTTS-v2 cpu 3.80s
9 Chatterbox Turbo cpu 4.64s
10 Chatterbox cpu 17.23s
11 Sesame CSM-1B cpu 18.63s
12 Qwen3-TTS 1.7B cpu 18.92s
13 IndexTTS-2 cpu 23.93s
14 OmniVoice cpu 28.29s
15 ZipVoice 123M cpu 33.32s
16 F5-TTS mps 39.41s
17 F5-TTS cpu 42.09s
18 Chatterbox mps 54.25s
19 OmniVoice mps

Prompt 3

[en]"The Parakeet TDT zero point six billion parameter model achieves one point six nine percent word error rate on LibriSpeech test-clean, beating Whisper Large V3 at two point seven percent while running at over two thousand times realtime on a single GPU."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 36ms
2 NeuTTS Nano cpu 281ms
3 NeuTTS Air cpu 361ms
4 NeuTTS Nano mps 513ms
5 NeuTTS Air mps 581ms
6 Chatterbox Turbo cpu 12.27s
7 Coqui XTTS-v2 cpu 12.98s
8 Coqui XTTS-v2 mps 20.49s
9 Chatterbox mps 26.06s
10 Qwen3-TTS 1.7B cpu 40.14s
11 Sesame CSM-1B cpu 40.76s
12 OmniVoice cpu 41.11s
13 Chatterbox cpu 42.22s
14 Chatterbox Turbo mps 45.81s
15 F5-TTS cpu 68.43s
16 F5-TTS mps 70.52s
17 IndexTTS-2 cpu 93.07s

Prompt 4

[en]"Run pytest tests slash test underscore voice dot py with verbose flag and capture flag set to no."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 35ms
2 NeuTTS Nano cpu 279ms
3 NeuTTS Air cpu 359ms
4 NeuTTS Nano mps 495ms
5 NeuTTS Air mps 556ms
6 Chatterbox Turbo mps 4.65s
7 Chatterbox Turbo cpu 6.04s
8 Coqui XTTS-v2 cpu 6.45s
9 Coqui XTTS-v2 mps 7.32s
10 Chatterbox cpu 22.05s
11 Qwen3-TTS 1.7B cpu 25.41s
12 Sesame CSM-1B cpu 26.38s
13 OmniVoice cpu 29.09s
14 IndexTTS-2 cpu 36.14s
15 F5-TTS mps 40.42s
16 F5-TTS cpu 43.27s
17 Chatterbox mps 51.18s
18 ZipVoice 123M cpu

Prompt 5

[fr]"Bonjour, je m'appelle Cicero et je vais vous aider avec votre code aujourd'hui."
Rank Model Device TTFA warm Audio
1 Pocket-TTS cpu 87ms
2 NeuTTS Nano cpu 281ms
3 NeuTTS Nano mps 479ms
4 Coqui XTTS-v2 mps 2.08s
5 Coqui XTTS-v2 cpu 8.29s
6 Qwen3-TTS 1.7B cpu 15.83s
7 OmniVoice cpu 27.16s
8 ZipVoice 123M cpu 54.92s