The Humanness Index™

Which voice model sounds the most human?

Name: The Humanness Index™ leaderboard
Creator: Vapi
License: https://creativecommons.org/licenses/by/4.0/

Sounding human is hard to measure, but it's what decides whether a call works. We clone one voice onto every model and play them blind against a real human, so you can hear which ones pass.

Read the whitepaper

Which voice sounds more human?

Same voice, different models.

Read along

So I can see here that the package was marked as delivered on Tuesday, but if you're saying it never arrived then what we'll do is... let me just. Yeah, I'm going to open a lost package investigation for you. That usually takes about forty-eight hours to resolve.

←→ play each side · space vote, then next pair

Step 1
Same voice, every model
We clone one conversational voice onto every model, so you're judging the model, not its demo reel.
Step 2
You listen blind
Two voices, same line, no labels. Pick the one that sounds more human.
Step 3
A real human sets the bar
Blind votes are fit into a rating, with a real human at 100. The higher the score, the more human the model sounds.

Humanness distribution

21 Models9 providers10250 unique votes

Color = rank Average

Why latency matters. A voice that lags breaks the conversation, no matter how human it sounds.

Likely Rank		Model
Baseline	Human	Homo Sapien	100	1301	—	—	546
#1–5	xAI	Grok TTS	94	1283	460 ms	$15	525
#1–6	MiniMax	Speech 2.5	93	1278	325 ms	$60	502
#1–6	ElevenLabs	Eleven v3	92	1276	758 ms	$100	507
#1–6	Canopy Labs	Orpheus	91	1271	—	Open source	495
#1–7	MiniMax	Speech 2 HD	88	1263	357 ms	$100	487
#2–7	xAI	Grok TTS (Streaming)	88	1261	285 ms	$15	490
#6–11	Inworld	TTS-1.5-max	78	1231	337 ms	$35	428
#7–12	ElevenLabs	Flash v2	76	1225	226 ms	$50	425
#7–15	ElevenLabs	Multilingual v2	75	1221	1006 ms	$100	423
#7–15	ElevenLabs	Turbo v2	73	1216	302 ms	$50	414

The Index only includes models that support voice cloning: each battle plays the same cloned source voice through both models, so the comparison is head to head and fair. Don't see your model on this list? Contact us at humannessindex@vapi.ai.

What we Listen for

What makes a voice sound human?

Humanness doesn't break down into features. You either believe there's a person on the other end, or you don't. When that belief breaks, it's usually because of one of these.

Expressiveness

Emotion and emphasis. Stressing the right words, sounding like it means what it says instead of reading text aloud.

Tone & prosody

The intonation, rhythm, and melody of speech. The natural rise and fall of how people actually talk.

Artifacts

The little human sounds: breaths, stutters, natural pauses. A voice with none of them sounds too clean to be real.

Why trust this benchmark?

Any model can sound good on its own demo voice. The real test is how it handles your use case. We clone one voice across every model so the comparison is fair. Models that can't clone a voice can't be tested fairly, so they're not listed.

Most Human Models

#1 · Humanness leader

xAI

Grok TTS

Humanness

Latency: 460 ms
Languages: 20
Votes: 525

xAI's TTS is the voice to beat for naturalness. In blind, side-by-side comparisons, listeners pick it as the more human-sounding option more often than any other model on the Index, and it holds that edge at phone quality, where most voices start to sound synthetic. For teams where sounding human is the whole point, it's where we'd start.

MiniMax

Speech 2.5

Humanness

Latency: 325 ms
Languages: 40
Votes: 502

ElevenLabs

Eleven v3

Humanness

Latency: 758 ms
Languages: 70+
Votes: 507

Voting in progress

Rankings are provisional

We're keeping the podium under wraps while the votes come in. Listen and vote above, and the most human models reveal once the standings settle.

Why this exists

Picking a TTS model for a voice agent comes down to one thing: does it sound human enough that people forget they're talking to software? You can't get that from demos or vendor claims. So we made it measurable and took the call out of our own hands: one voice cloned onto every model, played blind with no names attached, scored against a real human by the people who hear it.

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Read the methodology Star on GitHub

Build a TTS model? Add yours to the Index.

Which voice model sounds the most human?

How it works

Same voice, every model

You listen blind

A real human sets the bar

Humanness Rankings

Humanness distribution

What makes a voice sound human?

Expressiveness

Tone & prosody

Artifacts

Why trust this benchmark?

Most Human Models

Why this exists

Find the most human-sounding voice for your agent.