ListenSpeedScores🗳 Vote ↗

Listen

One clip per model per voice mode. Audio is rig-independent (same weights → same output), so each sample is sourced once from the highest-fidelity available rig — Windows RTX 5090 where possible, else Linux, else Mac. The small tag on each row shows the source rig·device. Default voice = the model's own preset/built-in speaker; Cloning = the model imitating one reference voice (chris_hemsworth_15s). Switch By prompt (compare every model on one sentence) and By model (audition one model across prompts) below; only one clip plays at a time. Speed per rig is on the Speed page.
Prompt 1 [en] "Open the browser and read my email."
Default voice (34)
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts ⟳ voice varies2Blinux·cuda
DramaBox3.3Bwin·cuda
F5-TTS330Mwin·cuda
Higgs Audio v3 TTS ⟳ voice varies4Blinux·cuda
IndexTTS-21.5Bwin·cuda
KittenTTS<100Mwin·cpu
Kokoro82Mwin·cuda
LuxTTS123Mlinux·cuda
Magpie-TTS357Mwin·cuda
Mars5-TTS1.2Bwin·cuda
Maya13Bwin·cuda
MeloTTS~52Mwin·cuda
MiraTTS0.5Blinux·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Parler-TTS Mini v1878Mwin·cuda
Piper~25MBwin·cpu
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B1.7Bwin·cuda
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Soprano 80M80Mwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
Supertonic99Mwin·cpu
VibeVoice Realtime 0.5B0.5Bwin·cuda
VoxCPM2 2B2Bwin·cuda
Voxtral 4B TTS4Bmac·mps
Cloning — chris_hemsworth (34)
▶ Reference voice — the target each clone imitates:
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts2Blinux·cuda
DramaBox3.3Bwin·cuda
Echo-TTS2.8Bwin·cuda
F5-TTS330Mwin·cuda
Fish Speech 1.5~500Mwin·cuda
Fish Speech S2-Pro4Blinux·cuda
Higgs Audio v3 TTS4Blinux·cuda
IndexTTS-21.5Bwin·cuda
LuxTTS123Mlinux·cuda
Mars5-TTS1.2Bwin·cuda
MetaVoice-1B1.2Blinux·cuda
MiraTTS0.5Bwin·cuda
MOSS-TTS v1.08Bwin·cuda
MOSS-TTS v1.58Blinux·cuda
MOSS-TTS-Nano100Mwin·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OpenVoice v2~100Mwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
VibeVoice 1.5B1.5Bwin·cuda
VibeVoice 7B7Bwin·cuda
VoxCPM2 2B2Bwin·cuda
ZipVoice 123M123Mwin·cuda
Zonos v0.11.6Bwin·cuda
Prompt 2 [en] "I'll start a new git branch, push the changes, and open a pull request when the tests pass."
Default voice (34)
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts ⟳ voice varies2Blinux·cuda
DramaBox3.3Bwin·cuda
F5-TTS330Mwin·cuda
Higgs Audio v3 TTS ⟳ voice varies4Blinux·cuda
IndexTTS-21.5Bwin·cuda
KittenTTS<100Mwin·cpu
Kokoro82Mwin·cuda
LuxTTS123Mlinux·cuda
Magpie-TTS357Mwin·cuda
Mars5-TTS1.2Bwin·cuda
Maya13Bwin·cuda
MeloTTS~52Mwin·cuda
MiraTTS0.5Blinux·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Parler-TTS Mini v1878Mwin·cuda
Piper~25MBwin·cpu
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B1.7Bwin·cuda
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Soprano 80M80Mwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
Supertonic99Mwin·cpu
VibeVoice Realtime 0.5B0.5Bwin·cuda
VoxCPM2 2B2Bwin·cuda
Voxtral 4B TTS4Bmac·mps
Cloning — chris_hemsworth (34)
▶ Reference voice — the target each clone imitates:
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts2Blinux·cuda
DramaBox3.3Bwin·cuda
Echo-TTS2.8Bwin·cuda
F5-TTS330Mwin·cuda
Fish Speech 1.5~500Mwin·cuda
Fish Speech S2-Pro4Blinux·cuda
Higgs Audio v3 TTS4Blinux·cuda
IndexTTS-21.5Bwin·cuda
LuxTTS123Mlinux·cuda
Mars5-TTS1.2Bwin·cuda
MetaVoice-1B1.2Blinux·cuda
MiraTTS0.5Bwin·cuda
MOSS-TTS v1.08Bwin·cuda
MOSS-TTS v1.58Blinux·cuda
MOSS-TTS-Nano100Mwin·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OpenVoice v2~100Mwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
VibeVoice 1.5B1.5Bwin·cuda
VibeVoice 7B7Bwin·cuda
VoxCPM2 2B2Bwin·cuda
ZipVoice 123M123Mwin·cuda
Zonos v0.11.6Bwin·cuda
Prompt 3 [en] "The Parakeet TDT zero point six billion parameter model achieves one point six nine percent word error rate on LibriSpeech test-clean, beating Whisper Large V3 at two point seven percent while running at over two thousand times realtime on a single GPU."
Default voice (34)
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts ⟳ voice varies2Blinux·cuda
DramaBox3.3Bwin·cuda
F5-TTS330Mwin·cuda
Higgs Audio v3 TTS ⟳ voice varies4Blinux·cuda
IndexTTS-21.5Bwin·cuda
KittenTTS<100Mwin·cpu
Kokoro82Mwin·cuda
LuxTTS123Mlinux·cuda
Magpie-TTS357Mwin·cuda
Mars5-TTS1.2Bwin·cuda
Maya13Bwin·cuda
MeloTTS~52Mwin·cuda
MiraTTS0.5Blinux·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Parler-TTS Mini v1878Mwin·cuda
Piper~25MBwin·cpu
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B1.7Bwin·cuda
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Soprano 80M80Mwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
Supertonic99Mwin·cpu
VibeVoice Realtime 0.5B0.5Bwin·cuda
VoxCPM2 2B2Bwin·cuda
Voxtral 4B TTS4Bmac·mps
Cloning — chris_hemsworth (34)
▶ Reference voice — the target each clone imitates:
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts2Blinux·cuda
DramaBox3.3Bwin·cuda
Echo-TTS2.8Bwin·cuda
F5-TTS330Mwin·cuda
Fish Speech 1.5~500Mwin·cuda
Fish Speech S2-Pro4Blinux·cuda
Higgs Audio v3 TTS4Blinux·cuda
IndexTTS-21.5Bwin·cuda
LuxTTS123Mlinux·cuda
Mars5-TTS1.2Bwin·cuda
MetaVoice-1B1.2Blinux·cuda
MiraTTS0.5Bwin·cuda
MOSS-TTS v1.08Bwin·cuda
MOSS-TTS v1.58Blinux·cuda
MOSS-TTS-Nano100Mwin·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OpenVoice v2~100Mwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
VibeVoice 1.5B1.5Bwin·cuda
VibeVoice 7B7Bwin·cuda
VoxCPM2 2B2Bwin·cuda
ZipVoice 123M123Mwin·cpu
Zonos v0.11.6Bwin·cuda
Prompt 4 [en] "Run pytest tests slash test underscore voice dot py with verbose flag and capture flag set to no."
Default voice (34)
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts ⟳ voice varies2Blinux·cuda
DramaBox3.3Bwin·cuda
F5-TTS330Mwin·cuda
Higgs Audio v3 TTS ⟳ voice varies4Blinux·cuda
IndexTTS-21.5Bwin·cuda
KittenTTS<100Mwin·cpu
Kokoro82Mwin·cuda
LuxTTS123Mlinux·cuda
Magpie-TTS357Mwin·cuda
Mars5-TTS1.2Bwin·cuda
Maya13Bwin·cuda
MeloTTS~52Mwin·cuda
MiraTTS0.5Blinux·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Parler-TTS Mini v1878Mwin·cuda
Piper~25MBwin·cpu
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B1.7Bwin·cuda
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Soprano 80M80Mwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
Supertonic99Mwin·cpu
VibeVoice Realtime 0.5B0.5Bwin·cuda
VoxCPM2 2B2Bwin·cuda
Voxtral 4B TTS4Bmac·mps
Cloning — chris_hemsworth (34)
▶ Reference voice — the target each clone imitates:
ModelSizeSourceAudio
Chatterbox1.2Bwin·cuda
Chatterbox Turbo744Mwin·cuda
Coqui XTTS-v2750Mwin·cuda
Dia 1.6B1.6Bwin·cuda
dots.tts2Blinux·cuda
DramaBox3.3Bwin·cuda
Echo-TTS2.8Bwin·cuda
F5-TTS330Mwin·cuda
Fish Speech 1.5~500Mwin·cuda
Fish Speech S2-Pro4Blinux·cuda
Higgs Audio v3 TTS4Blinux·cuda
IndexTTS-21.5Bwin·cuda
LuxTTS123Mlinux·cuda
Mars5-TTS1.2Bwin·cuda
MetaVoice-1B1.2Blinux·cuda
MiraTTS0.5Bwin·cuda
MOSS-TTS v1.08Bwin·cuda
MOSS-TTS v1.58Blinux·cuda
MOSS-TTS-Nano100Mwin·cuda
NeuTTS Air748Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OpenVoice v2~100Mwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Sesame CSM-1B1Bwin·cuda
Step-Audio-EditX3Blinux·cuda
StyleTTS 2~148Mwin·cuda
VibeVoice 1.5B1.5Bwin·cuda
VibeVoice 7B7Bwin·cuda
VoxCPM2 2B2Bwin·cuda
ZipVoice 123M123Mwin·cuda
Zonos v0.11.6Bwin·cuda
Prompt 5 [fr] "Bonjour, je m'appelle Cicero et je vais vous aider avec votre code aujourd'hui."
Default voice (15)
ModelSizeSourceAudio
Coqui XTTS-v2750Mwin·cuda
dots.tts ⟳ voice varies2Blinux·cuda
Higgs Audio v3 TTS ⟳ voice varies4Blinux·cuda
Kokoro82Mwin·cuda
Magpie-TTS357Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Piper~25MBwin·cpu
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B1.7Bwin·cuda
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
Supertonic99Mwin·cpu
VoxCPM2 2B2Bwin·cuda
Voxtral 4B TTS4Bmac·mps
Cloning — chris_hemsworth (16)
▶ Reference voice — the target each clone imitates:
ModelSizeSourceAudio
Coqui XTTS-v2750Mwin·cuda
dots.tts2Blinux·cuda
Fish Speech 1.5~500Mwin·cuda
Higgs Audio v3 TTS4Blinux·cuda
MOSS-TTS v1.08Bwin·cuda
MOSS-TTS v1.58Blinux·cuda
MOSS-TTS-Nano100Mwin·cuda
NeuTTS Nano229Mwin·cuda
OmniVoice~1Bwin·cuda
OpenVoice v2~100Mwin·cuda
OuteTTS 1.0 1B1Bwin·cuda
Pocket-TTS100Mwin·cpu
Qwen3-TTS 1.7B (CUDA-graph)1.7Bwin·cuda
VoxCPM2 2B2Bwin·cuda
ZipVoice 123M123Mwin·cuda
Zonos v0.11.6Bwin·cuda