Our Technology
ThunderPhone's ensemble architecture allows for the best performance possible at the best price possible.
User Audio
System Prompt
Ensemble Engine
Multiple TranscribersFast & Smart
Thinking LLMsSlow but accurate
Audio-input LLMsNuance detection
Fast LLMsQuick responses
Continuous Parallel Routing
Synthesized Voice
Tool Execution
How it works
While other voice AI platforms work on a rigid “three-step” pipeline, our system takes advantage of numerous models at once to achieve the best performance possible. ThunderPhone's ensemble engine simultaneously uses:
- Commercial and open-source models.
- Thinking and non-thinking models.
- Text and audio input LLMs.
- Multiple transcription models.
- Multiple voice activity detection, noise detection models and other conversation quality models.
While this is a complex architecture to implement, it allows for the best possible tradeoff between cost, quality and latency.
We offer three configurations of this architecture as our products:
- Spark, optimized for low cost.
- Bolt, optimized for high speed and moderate cost.
- Storm, optimized for intelligence and high speed.
What we're replacing
Caller Audio
Single STT ModelTranscription errors cascade
Single Fast LLMProne to mistakes
TTS Model
Most voice AI platforms use a simple pipeline: one speech-to-text model transcribes the caller, one LLM generates a response, and one text-to-speech model speaks it back. It looks clean on paper, but it's brittle in real call conditions.
A single transcription model introduces errors — especially on proper nouns, uncommon words, and sound-alikes. With no second opinion, those mistakes cascade through the rest of the system.
The LLM layer has the same failure mode. A single fast model isn't smart enough for many real-world prompts, so teams resort to flow builders, prompt chains, and brittle workarounds — a headache to build and harder to maintain.
We believe agents should follow instructions reliably enough that a single prompt should be enough to configure most use cases. And this is exactly what our architecture unlocks.
Industry-leading audio accuracy
Mistake rate comparison against leading speech-to-speech models. Lower is better.
ThunderPhone Storm
0.6%
GPT-4o Realtime
3.6%
Gemini 2.0 Flash
5.2%
Claude 3.5 Haiku
7.1%
Gemini 1.5 Flash
9.4%
GPT-4o mini Realtime
11.8%
We've run our smartest model, Storm with extra intelligence, on the Big Bench Audio benchmark and found that it scores far higher than any of today's models, at 99.4% accurate.
This may not seem like much but at a 5% mistake rate, a model would make a mistake once per call for 20-turn calls. All things being equal, Storm would only make one mistake every ten calls of the same length — and likely even less on real calls.
That improvement in reliability can make a real difference when it comes to delivering a consistent experience to end users and following compliance guidelines. We have not yet benchmarked Spark and Bolt but expect them to perform similarly to other vendors' models, as they are optimized for cost and speed.
Our approach
We believe it should be our job to handle the hard technical details of making voice AI work, so you can focus on serving your customers' needs.
In line with that, we've optimized the model stack so you can choose whether you want to prioritize cost, speed, or intelligence.
We've also set up a platform that makes it easy to configure, test, and monitor agents so you can be sure they're doing the right thing and improve them along the way.
We also take great pains to squeeze every possible cost out of ThunderPhone's platform. And we pass the savings along — we aim for only a 20% gross margin on all of our products, so you can be confident you're getting the best deal possible.
If you're building voice agents for the real world, you should be focused on serving your customers' needs — not rebuilding the voice AI stack so it doesn't fall over on them or cost you a fortune. We're here to help you with that.
Ready to build your first phone agent?
Choose the model that fits your use case and get started in minutes.