The Humanness Index™

Which voice model sounds the most human?

Sounding human is hard to measure, but it's what decides whether a call works. We clone one voice onto every model and play them blind against a real human, so you can hear which ones pass.

Read the whitepaper

Which voice sounds more human?

Same voice, different models.

Read along
So I can see here that the package was marked as delivered on Tuesday, but if you're saying it never arrived then what we'll do is... let me just. Yeah, I'm going to open a lost package investigation for you. That usually takes about forty-eight hours to resolve.
vs

play each side · space vote, then next pair

How it works

  1. Step 1

    Same voice, every model

    We clone one conversational voice onto every model, so you're judging the model, not its demo reel.

  2. Step 2

    You listen blind

    Two voices, same line, no labels. Pick the one that sounds more human.

  3. Step 3

    A real human sets the bar

    Blind votes are fit into a rating, with a real human at 100. The higher the score, the more human the model sounds.

Humanness Rankings

Humanness distribution

21 Models9 providers10250 unique votes

Color = rank Average
02550751001002004008001600HumannessWorseBetterLatency (ms, log scale)FasterSlowerAbove Average

Why latency matters. A voice that lags breaks the conversation, no matter how human it sounds.

Likely RankModelListen
BaselineHumanHomo Sapien1001301546
#1–5xAIGrok TTS941283460 ms$15525
#1–6MiniMaxSpeech 2.5931278325 ms$60502
#1–6ElevenLabsEleven v3921276758 ms$100507
#1–6Canopy LabsOrpheus911271Open source495
#1–7MiniMaxSpeech 2 HD881263357 ms$100487
#2–7xAIGrok TTS (Streaming)881261285 ms$15490
#6–11InworldTTS-1.5-max781231337 ms$35428
#7–12ElevenLabsFlash v2761225226 ms$50425
#7–15ElevenLabsMultilingual v27512211006 ms$100423
#7–15ElevenLabsTurbo v2731216302 ms$50414

The Index only includes models that support voice cloning: each battle plays the same cloned source voice through both models, so the comparison is head to head and fair. Don't see your model on this list? Contact us at humannessindex@vapi.ai.

What we Listen for

What makes a voice sound human?

Humanness doesn't break down into features. You either believe there's a person on the other end, or you don't. When that belief breaks, it's usually because of one of these.

Expressiveness

Emotion and emphasis. Stressing the right words, sounding like it means what it says instead of reading text aloud.

Tone & prosody

The intonation, rhythm, and melody of speech. The natural rise and fall of how people actually talk.

Artifacts

The little human sounds: breaths, stutters, natural pauses. A voice with none of them sounds too clean to be real.

Why trust this benchmark?

Any model can sound good on its own demo voice. The real test is how it handles your use case. We clone one voice across every model so the comparison is fair. Models that can't clone a voice can't be tested fairly, so they're not listed.

Most Human Models

Voting in progress

Rankings are provisional

We're keeping the podium under wraps while the votes come in. Listen and vote above, and the most human models reveal once the standings settle.

Why this exists

Picking a TTS model for a voice agent comes down to one thing: does it sound human enough that people forget they're talking to software? You can't get that from demos or vendor claims. So we made it measurable and took the call out of our own hands: one voice cloned onto every model, played blind with no names attached, scored against a real human by the people who hear it.

Find the most human-sounding voice for your agent.

Compare the models in blind tests, read the methodology, or get in touch.

Build a TTS model? Add yours to the Index.