Technology

Go beyond the 3-step pipeline.

Most voice AI platforms rely on a 3-step pipeline: one transcription model, one non-thinking LLM, one text-to-speech engine. Each step is a single point of failure. ThunderPhone reduces mistakes by orchestrating many models at once.

Caller said “I rent.”
ASR A“I rent”
ASR B“I’m Brent”
Audio LLM“I rent”
Reconciled “I rent” 2 of 3 paths agree
The integrated architecture

Hear it many ways, then reconcile.

ThunderPhone orchestrates many models at once, letting them correct each other's mistakes.

Other platforms · 3-step pipeline

Each step trusts the last.

Caller audio
“I rent.” · mumbled
STT A
“I’m Brent”
STT B
Audio LLM
No backup when the single model mishears
LLM
A fast model sees only the transcript.
→ replies “Hi, Brent — how can I help?”
Fast LLMs make mistakes following instructions
TTS
The answer is spoken back.
→ speaks “Hi, Brent — how can I help?” ✗
1 transcript + 1 LLM = wrong answer
ThunderPhone · orchestrated

Many paths, one coordinated answer.

Caller audio
“I rent.” · mumbled
STT A
“I rent”
STT B
“I’m Brent”
Audio LLM
“I rent”
Three transcripts + raw audio kept as evidence
Reasoning layer
Fast + thinking LLMs weigh text and audio evidence.
→ 2 of 3 paths agree: “I rent”
Thinking + fast LLMs cross-check before answering
TTS
Curated voices minimize hallucinations on spellings and alphanumerics.
→ “Got it — so you rent.” ✓
Gets the hard details right the first time.
Performance proof

Storm sets intelligence records on Big Bench Audio.

BBA tests whether a model can understand spoken prompts and answer difficult questions correctly. Our strongest Storm configuration reaches 99.4% on our public evaluation set — 0.6% mistakes, 4× fewer than the next best public score.

Model
Mistake rate↓ lower is better
Score
ThunderPhone Storm · Extra Intelligence4× fewer mistakes
0.6%
Step-Audio R1.1
2.4%
Grok Voice Think Fast
2.9%
Ultravox v0.7 Thinking
3.0%
GPT-Realtime-2 High
3.4%
Gemini 3.1 Flash Live High
3.4%
View the BBA-Storm evaluation dataset on Hugging Face

Mistake rate is 100% minus the BBA success rate. ThunderPhone's result is from our public evaluation dataset, linked above; other public scores (Artificial Analysis leaderboard) are listed for orientation. BBA doesn't fully capture phone-call performance, but it's a useful signal for reasoning over spoken prompts.

Opinionated beats configurable

You shouldn't have to assemble a voice stack by hand.

Other voice AI platforms like Vapi, Retell, Pipecat, and LiveKit force you to make extensive configuration decisions to get your calls working. At ThunderPhone, we believe that tuning is our job, and that you should have to do as little work as possible to get your calls working smoothly.

Other voice AI platforms · many knobs to tune
STT LLM TTS Endpointing Denoising Interruptions Fallbacks Tool schema Latency VAD
ThunderPhone tunes all of this for you
ThunderPhone · three decisions
1
Pick a tier
Spark, Bolt, or Storm.
2
Pick a voice
Curated, tested on real calls.
3
Write your prompt
Behavior, policy, tools — plain language.
Three decisions. That's the setup — orchestration, fallbacks, and tuning ship built in.
We pick the best models for the job and stitch them together. Frontier models from OpenAI, Anthropic, and Google, working alongside open-source models — orchestrated for the best performance and price on every call, so you don't have to worry about it.
Single prompt, less graph sprawl

One prompt, not a graph of nodes.

Flow builders exist because weaker systems can't follow complex prompts. Every conversation step becomes a node you have to build and maintain manually. Storm follows rich instructions from a single prompt, so you don't have to worry about nodes and edges.

Flow-builder platforms

Hand-wire every branch.

Greetingprompt · voice
Collect phoneprompt · validation
Confirm phonere-prompt ×2
enroll()tool · retries
Escalateto human
Fallbackcatch-all
invalid ×2error
Six nodes for one enrollment flow — each with its own prompts, tools, transitions, and failure handling.
ThunderPhone Storm

Describe the behavior once.

Behavior prompt
Greet the caller and collect their name and phone.
Confirm consent before enrolling — required.
If they ask about billing, follow the billing policy below.
Call enroll() only once every field is confirmed.
If the caller asks for a person, transfer to a human.
Complex branching Optional watchdogs Easier iteration
One prompt to write, test, and iterate — branching included.
Why it matters

Production calls are messy. ThunderPhone is ready.

Mumbly speech, background chatter, names and numbers, mixed languages — ThunderPhone is built for the situations that break other systems.

Mumbly speech

Compare audio and text signals instead of trusting one transcript.

Bad phone audio

Noise ID and reduction models clean up the signal.

Background chatter

Identify side conversations before they derail the agent.

Names & addresses

Cross-check proper nouns, spellings, numbers, and corrections.

Mixed-language speech

Route through multilingual audio and voice paths when needed.

Long prompts

Use stronger reasoning paths when behavior can't reduce to one rule.

Latency, quality & cost

A faster wrong answer isn't a better call.

Phone AI has to answer quickly — but speed without correctness just ships fast mistakes. Today's LLMs need a moment to think to stay reliable, so ThunderPhone balances the two per tier.

Spark
~3s
to first response
Bolt
~2s
to first response
Storm
~2–3s
~2s with acknowledgements · ~3s without
MeasurementOther vendors claim 500ms latency under ideal conditions, but usually come in around 2s on a real call.We measure our latency under real conditions.
ResilienceAI vendors experience latency spikes that can derail calls.ThunderPhone's orchestrated stack fails over to faster options when this happens.
Generations of voice AI

The next generation of voice AI is here

Automated phone calls have been frustrating experiences...until now. Each wave of technology has improved the experience, but never to a point where the experience was truly smooth. That changes with ThunderPhone.

1990s2010s20242026 · now
Generation 1

IVR

Press one for sales, press two for support. Menus can only route.

Generation 2

Intent bots

Say “sales” or “billing” and hope the slot parser catches it.

Generation 3

Three-step AI

Transcribe, ask one fast LLM, then speak — each step trusting the last.

Generation 4

Integrated AI

Audio, text, reasoning, tools, voices, and guardrails as one system.

Built for the calls that break everything else

See the integrated stack
on your hardest calls.

Live in ten minutes. From 2¢/min.