Vapi vs Retell: which voice agent platform is cheaper?

Retell is usually cheaper and more predictable at $0.07/min flat with HIPAA included, while Vapi's real cost depends on configuration. Vapi's $0.05-0.13/min headline only holds on its bundled defaults; once you bring your own LLM, STT, and TTS (BYOK), effective cost climbs to roughly $0.23-0.33/min because you pay Vapi's platform fee plus each vendor separately. Retell bundles the orchestration and a usable model into one flat rate, which makes budgeting easier and tends to win for high-volume support workloads. Vapi wins when you need fine-grained control over which exact STT, LLM, and TTS providers run, and you are willing to manage that pricing complexity yourself.

What is the latency budget for a real-time voice agent?

Aim for under 800ms of total turn latency, with the widely cited target being roughly 300ms to feel conversational. That budget is split across speech-to-text (STT) finalization, LLM time-to-first-token, any tool or database calls, and text-to-speech (TTS) first audio. AssemblyAI and most voice vendors treat ~300ms STT as the threshold below which interruptions and turn-taking feel natural. The killer is tool calls: a single 400ms database lookup blows the budget on its own. Budget each stage explicitly, stream tokens into TTS instead of waiting for the full completion, and cache or pre-warm anything you can before the user finishes speaking.

When should I build on LiveKit or Pipecat instead of using a managed platform?

Build on LiveKit or Pipecat when you exceed roughly 50,000 voice minutes per month, where the per-minute savings of 60-80% over a managed platform outpace the engineering cost of owning the stack. Below 10,000 minutes/month, managed platforms like Vapi or Retell are almost always the right call because their per-minute premium is small in absolute dollars and they save weeks of telephony, latency, and infrastructure work. Between 10K and 50K minutes, a hybrid model often wins: keep the managed orchestrator but bring your own cheaper STT/TTS, or run a self-hosted pipeline for your highest-volume flow while managed handles the long tail.

Is Retell HIPAA compliant out of the box?

Yes, Retell includes HIPAA compliance in its standard $0.07/min pricing with a Business Associate Agreement available, which is unusual in this category. Most managed voice platforms gate HIPAA behind enterprise tiers or charge a premium, so for healthcare voice workloads Retell's bundled compliance is a meaningful cost and procurement advantage. Vapi offers HIPAA on higher tiers. If you build on LiveKit or Pipecat, compliance becomes your responsibility end to end: you must sign BAAs with every downstream STT, LLM, and TTS vendor, run them in compliant regions, and handle audio retention and redaction yourself. That control is the point of building, but it is also real work.

How much does an AI voice agent cost per minute in 2026?

Managed AI voice agents run roughly $0.05 to $0.33 per minute in 2026, depending heavily on configuration. Retell sits around $0.07/min flat. Vapi ranges from $0.05-0.13/min on bundled defaults to $0.23-0.33/min on full bring-your-own-key setups. Self-hosted stacks on LiveKit or Pipecat push raw per-minute cost toward the underlying STT/LLM/TTS vendor rates, often well under $0.05/min at volume, but you add infrastructure and engineering overhead. ElevenLabs cut its Conversational AI pricing roughly 50% in February 2026, which dragged the whole market's TTS economics down. Always model your own minute volume against a specific provider combination rather than trusting headline rates.

What is the difference between Vapi, Retell, LiveKit, and Pipecat?

Vapi and Retell are managed voice orchestrators: you configure an agent through an API or dashboard and they handle STT, LLM, TTS, telephony, and turn-taking for you. LiveKit and Pipecat are open frameworks you self-host: they give you the real-time transport and pipeline primitives but you wire the components and run the infrastructure. Vapi emphasizes provider flexibility and BYOK; Retell emphasizes a flat all-in price with HIPAA. LiveKit ships a production WebRTC backbone with v1.5 adaptive interruption and dynamic endpointing; Pipecat is a Python-first pipeline framework popular for custom voice flows. Pick managed for speed to launch, framework for cost and control at scale.

Can I migrate off a managed voice platform once I outgrow it?

Yes, and you should plan for it from day one because the migration is mostly about reimplementing orchestration logic, not data. The hard parts are recreating turn-taking and interruption handling, re-establishing telephony numbers and SIP trunks, and re-tuning the latency budget on infrastructure you now own. Keep your prompts, tool definitions, and conversation logic in your own codebase rather than locked in a vendor dashboard, so the agent's behavior is portable. Run the new LiveKit or Pipecat pipeline in parallel on a small traffic slice, compare latency and transcription quality against the managed baseline, then cut over flow by flow rather than all at once.

BLOG/AI AGENTS

Vapi vs Retell vs LiveKit vs Pipecat: Picking a Voice Agent Stack

Vapi runs $0.05-0.13/min but $0.23-0.33 BYOK; Retell is flat $0.07 with HIPAA. The real cut is the 300ms latency budget and your monthly minutes.

Sebastian MondragonMAY 14, 2026 · 12 MIN READ

Vapi vs Retell vs LiveKit vs Pipecat: Picking a Voice Agent Stack

A voice agent lives or dies on a number most teams never budget for: the gap between when a caller stops talking and when they hear a reply. AssemblyAI and most of the voice ecosystem put the conversational threshold at roughly 300ms. Cross it consistently and callers start talking over the agent, the turn-taking breaks, and the whole thing feels like a bad IVR from 2009. Every architecture decision in this space, including the choice between Vapi, Retell, LiveKit, and Pipecat, is downstream of that latency budget and the per-minute cost it takes to hit it.

The four platforms split cleanly into two camps. Vapi and Retell are managed orchestrators: you describe an agent and they run the speech-to-text, the LLM, the text-to-speech, the telephony, and the turn-taking for you. LiveKit and Pipecat are frameworks you operate yourself: they hand you a production-grade real-time backbone and pipeline primitives, and you own everything above the metal. The managed camp trades money for speed to launch. The framework camp trades engineering hours for unit economics that hold up at scale. Picking wrong in either direction is expensive, and the crossover point is more predictable than most build-vs-buy decisions.

This post is the decision framework we use to scope voice agent stacks: the real pricing once you strip away the headline rates, the 300ms latency budget broken down stage by stage, the volume thresholds where each option wins, the telephony and compliance cuts that quietly decide vendor selection, and the migration plan for when you outgrow managed. The voice agent platform you should pick depends on three inputs: your monthly minutes, your latency tolerance, and your compliance posture. Everything else is detail.

01 · The Four-Platform Landscape: Managed vs Framework

Before pricing, understand what each product actually is, because the category determines what you are responsible for.

Vapi is a managed orchestrator built around provider flexibility. You configure an assistant through its API or dashboard, choose your STT, LLM, and TTS providers, and Vapi handles the real-time plumbing, telephony, and turn-taking. Its signature feature is bring-your-own-key (BYOK): you can plug in your own OpenAI, Deepgram, ElevenLabs, or Cartesia accounts. That flexibility is also where its pricing gets complicated, which we will get to.

Retell is a managed orchestrator built around simplicity and compliance. It bundles orchestration plus a usable model into one flatter per-minute rate, and it includes HIPAA in the standard pricing rather than gating it behind an enterprise tier. Where Vapi gives you knobs, Retell gives you a predictable bill. For high-volume support and healthcare workloads, that predictability is often worth more than provider choice.

LiveKit is an open real-time infrastructure framework. Its core is a production WebRTC backbone used well beyond voice agents, and its Agents framework sits on top for building voice and multimodal agents. LiveKit v1.5 shipped adaptive interruption handling and dynamic endpointing, which are exactly the turn-taking primitives that are painful to build yourself. You self-host or run it on LiveKit Cloud, and you wire the STT/LLM/TTS components.

Pipecat is an open, Python-first pipeline framework. Originally from Daily, it models a voice agent as a pipeline of processors (transcription, LLM, synthesis, transport) that you compose in code. It is the most flexible of the four and the most hands-on. Teams that want full control over every stage of the audio loop, or that need custom processing the managed platforms do not expose, tend to land here or on LiveKit.

The honest way to read this landscape: Vapi and Retell sell you time, LiveKit and Pipecat sell you ceiling. Softcery's widely shared 12-platform comparison makes the same split, and the more platforms you evaluate, the more the decision collapses back to managed-versus-framework rather than feature checklists. This is the same build-versus-buy axis we cover in when to build vs buy AI, applied to the specific economics of real-time voice.

02 · Pricing Reality: Headline Rates vs What You Actually Pay

The per-minute numbers on these landing pages are real but incomplete. Here is what we see once a configuration is fully specified, as of Q2 2026.

A few patterns matter more than the exact cents:

Vapi's headline rate assumes bundled defaults. The $0.05-0.13/min range holds when you use Vapi's bundled providers. The moment you switch to BYOK to control quality or compliance, you pay Vapi's platform fee plus every downstream vendor separately, and the effective rate climbs to roughly $0.23-0.33/min. BYOK is not a discount; it is a control lever you pay for. Choose it when provider selection is a hard requirement, not to save money.

Retell's flat $0.07/min is the predictability play. One number, HIPAA included, no per-vendor reconciliation. For a support line doing tens of thousands of minutes a month, a flat rate that already covers compliance is frequently the lowest total cost of ownership even if a hand-tuned BYOK setup could shave cents per minute in theory.

LiveKit and Pipecat are infrastructure-only. Your per-minute cost is whatever your STT, LLM, and TTS vendors charge plus your compute. At volume that pushes well under $0.05/min, but you add infrastructure, on-call, and latency-tuning labor that the managed platforms absorb.

The market is moving under all of this. ElevenLabs cut its Conversational AI pricing roughly 50% in February 2026, which dragged TTS economics down across every stack that uses it. Voice pricing is one of the faster-moving corners of AI infrastructure right now, so model your own minute volume against a specific provider combination and re-check rates before you commit a budget.

Platform	Effective cost/min	Pricing model	HIPAA	You operate
Vapi (bundled)	$0.05-0.13	Platform fee + bundled providers	Higher tiers	No
Vapi (BYOK)	$0.23-0.33	Platform fee + each vendor billed separately	Higher tiers	No
Retell	~$0.07 flat	All-in bundle	Included	No
LiveKit	Infra + vendor rates	Self-host / Cloud + STT/LLM/TTS	Your responsibility	Yes
Pipecat	Infra + vendor rates	Self-host + STT/LLM/TTS	Your responsibility	Yes

03 · The 300ms Latency Budget, Stage by Stage

Latency is the metric that decides whether your agent feels like a conversation or a phone tree. The target most of the voice ecosystem converges on is roughly 300ms of perceived turn latency, with anything past ~800ms total reading as sluggish. That budget is not one number; it is a chain, and every link spends part of it.

Here is the breakdown of a single turn, from the caller finishing a sentence to hearing the first word back:

Add those naively and you are already over a second. Hitting a conversational feel means overlapping the stages, not running them in series:

Stream STT, do not wait for the final transcript. Start the LLM on partial transcripts and reconcile when the final lands.

Stream LLM tokens directly into TTS. Synthesize the first clause while the model is still generating the rest. This single change often saves 300-500ms of perceived latency.

Treat tool calls as the prime suspect. A 400ms database lookup is the most common budget-killer we see. Pre-fetch likely data while the caller is still speaking, cache aggressively, and move anything non-blocking off the critical path.

Pre-warm models. Cold starts on the LLM or TTS path can add seconds. Keep the path hot.

This is where LiveKit's v1.5 work earns its keep: adaptive interruption and dynamic endpointing tune that first 100-300ms VAD window in real time instead of using a fixed threshold, which is the difference between an agent that talks over people and one that waits a beat too long. Managed platforms hide this tuning from you, which is convenient until you need to change it. If your broader stack is fighting latency outside the voice loop too, the same principles in our LLM latency fixes for production apps guide apply directly to the LLM and tool-call stages of this budget.

Stage	What it does	Typical spend	Where it goes wrong
Endpointing / VAD	Detect the caller stopped talking	100-300ms	Too eager interrupts; too slow feels dead
STT finalization	Final transcript of the utterance	~100-300ms	Waiting for full transcript instead of streaming
LLM time-to-first-token	Model starts generating	200-600ms	Cold model, long prompt, no streaming
Tool / DB calls	Lookups the answer depends on	0-500ms+	One slow call blows the whole budget
TTS first audio	First synthesized audio out	~100-300ms	Waiting for full text before synthesizing

04 · When Each Platform Wins: The Volume Thresholds

The cleanest way to decide is by monthly voice minutes, because that is what flips the cost math from favoring managed to favoring framework.

Under 10,000 minutes per month: stay managed

At this volume, the per-minute premium of a managed platform is trivial in absolute dollars. Ten thousand minutes at even $0.30/min is $3,000 a month. The engineering cost of building, tuning, and operating a LiveKit or Pipecat stack dwarfs that several times over in the first quarter alone. Use Vapi if you need specific provider control, Retell if you want a flat HIPAA-included rate. The goal here is to launch, learn, and validate the use case before you spend on infrastructure. This is the same logic that makes managed the default for early-stage agents generally, as we argue in our pillar on AI agents.

10,000 to 50,000 minutes per month: go hybrid

This is the band where teams thrash, and where a hybrid posture usually wins. Two patterns work well: The point of hybrid is to capture most of the savings on the minutes that matter while deferring the operational cost of owning everything. Do not migrate the whole estate at once; migrate the flow whose volume justifies the work.

Stay on the managed orchestrator but bring your own cheaper STT/TTS where quality allows, trimming the per-minute cost without giving up turn-taking and telephony.
Self-host your single highest-volume flow (say, the appointment-reminder bot doing 80% of your minutes) on LiveKit or Pipecat, and leave the long tail of lower-volume flows on the managed platform.

Above 50,000 minutes per month: build on a framework

Past roughly 50K minutes/month, the math tips decisively. Building on LiveKit or Pipecat saves an estimated 60-80% on per-minute cost versus a managed platform at that scale, because you are paying vendor rates directly instead of a platform markup on top of them. At 100K minutes, an $0.18/min difference is $18,000 a month, which funds the engineering and on-call required to run the stack with room to spare. You take on latency tuning, telephony, and uptime, but the unit economics now reward that ownership. This crossover, where a build path saves 60-80% above the 10-50K range, is the single most important number in the decision.

Monthly minutes	Recommendation	Primary reason
Under 10K	Managed (Vapi or Retell)	Per-minute premium is trivial; speed to launch wins
10K-50K	Hybrid	Capture savings on top-volume flows, defer ops cost
Over 50K	Framework (LiveKit or Pipecat)	60-80% per-minute savings funds owning the stack

05 · Telephony, HIPAA, and SOC 2: The Cuts That Decide It

Cost and latency narrow the field, but compliance and telephony often make the final call, especially for regulated or high-volume phone workloads.

Telephony. Phone calls mean SIP trunks, phone number provisioning, and carrier-grade reliability. Vapi and Retell handle this for you, which is a large part of what you are paying for. On LiveKit or Pipecat you wire telephony yourself (LiveKit has SIP support; Pipecat integrates with transport providers), and getting reliable, low-latency PSTN connectivity is non-trivial work. If your agent is web or app-only and never touches the phone network, this cut matters less and the framework path gets easier.

HIPAA. Retell includes HIPAA in its $0.07/min standard pricing with a BAA available, which is genuinely unusual and a strong reason to shortlist it for healthcare voice. Vapi offers HIPAA on higher tiers. On LiveKit or Pipecat, compliance is end-to-end your responsibility: you sign BAAs with every downstream STT, LLM, and TTS vendor, run them in compliant regions, and own audio retention, redaction, and access controls. That is the price of control. For the full picture on building voice and other AI in regulated healthcare environments, see our guide on HIPAA-compliant AI healthcare implementation.

SOC 2. Managed platforms carry their own SOC 2 attestations, which simplifies your vendor review. Self-hosting shifts the burden to your own infrastructure and your chosen vendors. For enterprise procurement, a managed platform with the right certifications can shave weeks off the security review, which is a real cost even when it does not show up on the per-minute rate.

The pattern across these cuts: managed platforms sell you compliance and telephony as bundled features. Frameworks make you assemble them. Below scale, bundled is cheaper in total cost. Above scale, the assembly cost is worth it because the per-minute savings dominate.

06 · The Migration Plan for When You Outgrow Managed

The most expensive mistake in this space is not picking the wrong managed platform on day one. It is building your agent in a way that locks the orchestration logic inside a vendor dashboard, so that outgrowing the platform means rewriting the agent from scratch. Plan the exit before you need it.

Keep your logic portable from the start. Your prompts, tool definitions, conversation flow, and business rules should live in your own codebase and be callable from any orchestrator, not buried in a managed platform's UI. The managed platform should be running your logic, not owning it. When portability is designed in, migration becomes reimplementing the real-time plumbing, which is bounded work, rather than reverse-engineering your own agent's behavior.

Migrate the metrics, not just the feature parity. When you move a flow to LiveKit or Pipecat, the thing to protect is the latency budget and transcription quality you had on managed. Run the new pipeline in parallel on a small traffic slice, measure turn latency and word error rate against the managed baseline, and only cut over when the self-hosted path matches or beats it. Reliability, not raw accuracy, is what breaks first in production agents, a pattern we dig into in why agent reliability lags accuracy.

Cut over flow by flow. Move your highest-volume, simplest flow first, since that is where the savings are largest and the risk is lowest. Leave complex or low-volume flows on managed until the framework pipeline is hardened. A staged migration lets you capture most of the cost savings early while limiting blast radius.

Re-establish telephony deliberately. Phone numbers, SIP trunks, and carrier reliability are the least glamorous and most failure-prone part of the move. Provision and test telephony on the new stack well before cutover, and keep the managed numbers live as a fallback during the transition.

If your voice agent is one node in a larger multi-step or multi-agent system, the orchestration choices interact, and the framework you pick for voice should sit comfortably alongside the rest. Our comparison of Mastra vs LangGraph vs Vercel AI SDK for TypeScript agents covers the application-layer framework decision that pairs with the voice transport layer described here. At Particula Tech, the voice engagements we scope usually start exactly here: modeling real minute volume against a provider combination, mapping the 300ms budget across the actual tool calls, and writing the migration runbook before the managed bill makes it urgent.

07 · Recommendation by Scenario

We close every voice agent scoping conversation with one of a few concrete starting points. They are imperfect, every workload has wrinkles, but they hold up most often:

Early stage, validating the use case, under 10K minutes/month. Managed. Retell if you want a flat HIPAA-included rate and predictable bill; Vapi if you need specific provider control via BYOK and accept the pricing complexity.

Healthcare or regulated voice, any volume below the framework threshold. Retell first, for bundled HIPAA at $0.07/min. Confirm the BAA covers your exact flow before committing.

Growing support line, 10K-50K minutes/month. Hybrid. Keep managed orchestration, bring your own cheaper STT/TTS where quality allows, or self-host only your single highest-volume flow.

High-volume voice, over 50K minutes/month, in-house engineering. Build on LiveKit for its production WebRTC backbone and v1.5 turn-taking, or Pipecat for maximum Python-side control. The 60-80% per-minute savings funds the team that runs it.

Web/app-only agent with no PSTN. The framework path gets easier because you skip telephony. LiveKit or Pipecat become attractive at lower volumes than they would for phone workloads.

Pick by your minutes, your latency budget, and your compliance posture, in that order. Keep your agent logic portable so the managed platform is a tenant, not a landlord. And model your real per-minute cost against a specific provider combination before you trust any headline rate, because in voice the headline and the invoice are rarely the same number.

08 · FAQ

Quick answers to the questions this post tends to raise.

BLOG/AI AGENTS

Vapi vs Retell vs LiveKit vs Pipecat: Picking a Voice Agent Stack

Vapi runs $0.05-0.13/min but $0.23-0.33 BYOK; Retell is flat $0.07 with HIPAA. The real cut is the 300ms latency budget and your monthly minutes.

Sebastian MondragonMAY 14, 2026 · 12 MIN READ

01 · The Four-Platform Landscape: Managed vs Framework

Before pricing, understand what each product actually is, because the category determines what you are responsible for.

02 · Pricing Reality: Headline Rates vs What You Actually Pay

The per-minute numbers on these landing pages are real but incomplete. Here is what we see once a configuration is fully specified, as of Q2 2026.

A few patterns matter more than the exact cents:

Platform	Effective cost/min	Pricing model	HIPAA	You operate
Vapi (bundled)	$0.05-0.13	Platform fee + bundled providers	Higher tiers	No
Vapi (BYOK)	$0.23-0.33	Platform fee + each vendor billed separately	Higher tiers	No
Retell	~$0.07 flat	All-in bundle	Included	No
LiveKit	Infra + vendor rates	Self-host / Cloud + STT/LLM/TTS	Your responsibility	Yes
Pipecat	Infra + vendor rates	Self-host + STT/LLM/TTS	Your responsibility	Yes

03 · The 300ms Latency Budget, Stage by Stage

Here is the breakdown of a single turn, from the caller finishing a sentence to hearing the first word back:

Add those naively and you are already over a second. Hitting a conversational feel means overlapping the stages, not running them in series:

Stream STT, do not wait for the final transcript. Start the LLM on partial transcripts and reconcile when the final lands.

Stream LLM tokens directly into TTS. Synthesize the first clause while the model is still generating the rest. This single change often saves 300-500ms of perceived latency.

Pre-warm models. Cold starts on the LLM or TTS path can add seconds. Keep the path hot.

Stage	What it does	Typical spend	Where it goes wrong
Endpointing / VAD	Detect the caller stopped talking	100-300ms	Too eager interrupts; too slow feels dead
STT finalization	Final transcript of the utterance	~100-300ms	Waiting for full transcript instead of streaming
LLM time-to-first-token	Model starts generating	200-600ms	Cold model, long prompt, no streaming
Tool / DB calls	Lookups the answer depends on	0-500ms+	One slow call blows the whole budget
TTS first audio	First synthesized audio out	~100-300ms	Waiting for full text before synthesizing

04 · When Each Platform Wins: The Volume Thresholds

The cleanest way to decide is by monthly voice minutes, because that is what flips the cost math from favoring managed to favoring framework.

Under 10,000 minutes per month: stay managed

10,000 to 50,000 minutes per month: go hybrid

Stay on the managed orchestrator but bring your own cheaper STT/TTS where quality allows, trimming the per-minute cost without giving up turn-taking and telephony.
Self-host your single highest-volume flow (say, the appointment-reminder bot doing 80% of your minutes) on LiveKit or Pipecat, and leave the long tail of lower-volume flows on the managed platform.

Above 50,000 minutes per month: build on a framework

Monthly minutes	Recommendation	Primary reason
Under 10K	Managed (Vapi or Retell)	Per-minute premium is trivial; speed to launch wins
10K-50K	Hybrid	Capture savings on top-volume flows, defer ops cost
Over 50K	Framework (LiveKit or Pipecat)	60-80% per-minute savings funds owning the stack

05 · Telephony, HIPAA, and SOC 2: The Cuts That Decide It

Cost and latency narrow the field, but compliance and telephony often make the final call, especially for regulated or high-volume phone workloads.

06 · The Migration Plan for When You Outgrow Managed

07 · Recommendation by Scenario

We close every voice agent scoping conversation with one of a few concrete starting points. They are imperfect, every workload has wrinkles, but they hold up most often:

Healthcare or regulated voice, any volume below the framework threshold. Retell first, for bundled HIPAA at $0.07/min. Confirm the BAA covers your exact flow before committing.

Growing support line, 10K-50K minutes/month. Hybrid. Keep managed orchestration, bring your own cheaper STT/TTS where quality allows, or self-host only your single highest-volume flow.

Web/app-only agent with no PSTN. The framework path gets easier because you skip telephony. LiveKit or Pipecat become attractive at lower volumes than they would for phone workloads.

08 · FAQ

Quick answers to the questions this post tends to raise.