Rig:mac-m4 — Apple M4 (10C) · Apple M4 GPU (MPS) · 16 GB RAM · Darwin 25.5.0
Label: cloning · jo — ref jo.wav
5 prompt(s) · one section per prompt · all models ranked by warm TTFA (fastest first) within each
Each prompt section shows every model's audio output, ordered by warm TTFA (fastest first). Click any audio player to hear that model's rendering.
Reference voice
Each model below was given this clip + transcript as the voice to imitate. Source: jo.wav
Prompt 1
[en]"Open the browser and read my email."
Rank
Model
Device
TTFA warm
Audio
1
Pocket-TTS
cpu
29ms
2
NeuTTS Nano
cpu
281ms
3
NeuTTS Air
cpu
354ms
4
NeuTTS Nano
mps
490ms
5
NeuTTS Air
mps
545ms
6
Coqui XTTS-v2
mps
1.38s
7
Chatterbox Turbo
mps
2.03s
8
Coqui XTTS-v2
cpu
2.11s
9
Chatterbox Turbo
cpu
2.63s
10
Chatterbox
mps
5.00s
11
Sesame CSM-1B
cpu
8.51s
12
Chatterbox
cpu
9.24s
13
Qwen3-TTS 1.7B
cpu
9.76s
14
OmniVoice
cpu
23.79s
15
ZipVoice 123M
cpu
24.59s
16
IndexTTS-2
cpu
26.61s
17
F5-TTS
mps
30.70s
18
F5-TTS
cpu
34.44s
Prompt 2
[en]"I'll start a new git branch, push the changes, and open a pull request when the tests pass."
Rank
Model
Device
TTFA warm
Audio
1
Pocket-TTS
cpu
31ms
2
NeuTTS Nano
cpu
283ms
3
NeuTTS Air
cpu
360ms
4
NeuTTS Nano
mps
486ms
5
NeuTTS Air
mps
564ms
6
Coqui XTTS-v2
mps
2.74s
7
Chatterbox Turbo
mps
3.53s
8
Coqui XTTS-v2
cpu
3.80s
9
Chatterbox Turbo
cpu
4.64s
10
Chatterbox
cpu
17.23s
11
Sesame CSM-1B
cpu
18.63s
12
Qwen3-TTS 1.7B
cpu
18.92s
13
IndexTTS-2
cpu
23.93s
14
OmniVoice
cpu
28.29s
15
ZipVoice 123M
cpu
33.32s
16
F5-TTS
mps
39.41s
17
F5-TTS
cpu
42.09s
18
Chatterbox
mps
54.25s
19
OmniVoice
mps
—
Prompt 3
[en]"The Parakeet TDT zero point six billion parameter model achieves one point six nine percent word error rate on LibriSpeech test-clean, beating Whisper Large V3 at two point seven percent while running at over two thousand times realtime on a single GPU."
Rank
Model
Device
TTFA warm
Audio
1
Pocket-TTS
cpu
36ms
2
NeuTTS Nano
cpu
281ms
3
NeuTTS Air
cpu
361ms
4
NeuTTS Nano
mps
513ms
5
NeuTTS Air
mps
581ms
6
Chatterbox Turbo
cpu
12.27s
7
Coqui XTTS-v2
cpu
12.98s
8
Coqui XTTS-v2
mps
20.49s
9
Chatterbox
mps
26.06s
10
Qwen3-TTS 1.7B
cpu
40.14s
11
Sesame CSM-1B
cpu
40.76s
12
OmniVoice
cpu
41.11s
13
Chatterbox
cpu
42.22s
14
Chatterbox Turbo
mps
45.81s
15
F5-TTS
cpu
68.43s
16
F5-TTS
mps
70.52s
17
IndexTTS-2
cpu
93.07s
Prompt 4
[en]"Run pytest tests slash test underscore voice dot py with verbose flag and capture flag set to no."
Rank
Model
Device
TTFA warm
Audio
1
Pocket-TTS
cpu
35ms
2
NeuTTS Nano
cpu
279ms
3
NeuTTS Air
cpu
359ms
4
NeuTTS Nano
mps
495ms
5
NeuTTS Air
mps
556ms
6
Chatterbox Turbo
mps
4.65s
7
Chatterbox Turbo
cpu
6.04s
8
Coqui XTTS-v2
cpu
6.45s
9
Coqui XTTS-v2
mps
7.32s
10
Chatterbox
cpu
22.05s
11
Qwen3-TTS 1.7B
cpu
25.41s
12
Sesame CSM-1B
cpu
26.38s
13
OmniVoice
cpu
29.09s
14
IndexTTS-2
cpu
36.14s
15
F5-TTS
mps
40.42s
16
F5-TTS
cpu
43.27s
17
Chatterbox
mps
51.18s
18
ZipVoice 123M
cpu
—
Prompt 5
[fr]"Bonjour, je m'appelle Cicero et je vais vous aider avec votre code aujourd'hui."