Deepgram

What is Deepgram?

Deepgram is a speech AI platform for product and engineering teams that turns audio into transcripts, analytics, and conversational responses through real-time and batch APIs. Its stack includes Speech to Text, Text to Speech, Voice Agent, Audio Intelligence, and features like diarization, smart formatting, and keyterm prompting. It integrates with Twilio, Vapi, LiveKit, and OpenAI, and powers customers such as Twilio, Cloudflare, Sierra, and NASA. Plans run Pay As You Go Free $200 Credit, Growth $4K+ / year, and Enterprise Contact Sales.

Last verifiedMay 17, 2026How we evaluate

Visit Deepgram

At a glance

Best for: Deepgram is best for product and engineering teams that need low-latency voice APIs for transcription, synthesis, and agents.
Pricing: Pay As You Go Free $200 Credit; Growth $4K+ / year; Enterprise Custom
API: Yes — Deepgram offers a speech-to-text API with model-specific docs and feature docs for transcription capabilities.

What does Deepgram do?

Deepgram handles speech-to-text, text-to-speech, and voice-agent workflows through real-time and batch APIs, so teams can turn audio into transcripts, analytics, and conversational responses. Its speech stack includes Flux for live voice agents, Nova for transcription, and Audio Intelligence for extracting insights from conversational audio, while the voice-agent API ties speech recognition, orchestration, and synthesis into one flow. The product pages show built-in turn detection, natural interruption handling, diarization, smart formatting, and keyterm prompting as part of that pipeline. At scale, Deepgram says it powers millions of audio minutes each day for hundreds of enterprises and supports thousands of concurrent sessions. The company was founded in 2015 and positions the platform for cloud and self-hosted deployment, with dedicated options available for larger deployment or support needs. Customers and partners named on the site include Twilio, Cloudflare, Sierra, NASA, Five9, and Vapi, and the developer docs note model-specific transcription documentation plus feature docs for speech capabilities.

Why use Deepgram?

Deepgram combines speech recognition, synthesis, and orchestration in one stack, reducing the integration work of assembling separate vendors.
Its self-hosted option gives teams a deployment path for stricter infrastructure or data requirements.
The platform is built for scale, with support for thousands of concurrent sessions and millions of audio minutes processed each day.
Named customers like Twilio, Cloudflare, NASA, and Five9 show it fits both startup and enterprise voice workloads.
The API-first approach and model-specific docs make it easier for developers to move from experimentation to production.

Who is Deepgram for?

Product engineers who need to add real-time speech features without stitching together multiple vendors.
Voice AI teams who need conversational control, low latency, and synchronized speech-to-speech flows.
Contact center builders who need transcription and analytics for high-volume customer conversations.
Platform teams who need cloud or self-hosted deployment options for voice workloads.
Developers who need API-first speech tools with model-specific documentation.

What are Deepgram's key features?

Speech to Text

Transcribe audio with Deepgram's API across streaming and batch workflows, including diarization and smart formatting for cleaner transcripts at scale.

Text to Speech

Generate natural-sounding speech with sub-200ms streaming text-to-speech and 40+ English voices, useful for low-latency voice experiences.

Voice Agent

Build real-time voice agents with Flux models, built-in turn detection, natural interruption handling, and ultra-low latency for live conversations.

Audio Intelligence

Extract summaries, sentiment analysis, intent recognition, and topic detection from audio, helping teams turn calls into searchable operational data.

Build with APIs

Use Deepgram's speech-to-text API and model-specific docs to integrate transcription features directly into products and workflows.

Integrate Deepgram

Connect Deepgram with Twilio, Vapi, LiveKit, Genesys, Five9, Vonage, and OpenAI for call, agent, and app integrations.

Flexible deployment options

Choose cloud, self-hosted, or dedicated deployment to match security, reliability, and data-control requirements for enterprise speech workloads.

Custom models

Train field-tuned models for specialized speech tasks, including medical transcripts and contact centers, to improve accuracy on niche vocabulary.

What does Deepgram integrate with?

Twilio
daily
Granola
vapi
livekit
cloudfare
Amazon AWS
AudioCodes
Cognigy
Enterprise Bot
Five9
Genesys
Kore.ai
OneReach
Replicant
Vercel
Vonage
OpenAI

What are Deepgram's use cases?

Voice agents for product engineers

Product engineers use Deepgram to add real-time speech features without stitching together multiple vendors, using Voice Agent and Build with APIs to ship conversational experiences faster. They can pair Speech to Text with Text to Speech for a tighter loop between user input and spoken responses.

Contact center analytics

Contact center builders use Deepgram to transcribe high-volume customer conversations and surface what matters, using Audio Intelligence and Speech Analytics to capture sentiment, intent, and topics. Diarization helps separate speakers cleanly, making QA and coaching easier to act on.

Deployment choices for platform teams

Platform teams use Deepgram to run voice workloads in the environment that fits their architecture, using Flexible deployment options and Self-Hosted to meet security or control requirements. They can keep the same speech stack while choosing Cloud or Dedicated deployment paths.

API-first speech for developers

Developers use Deepgram to build speech workflows directly into their products, using Integrate Deepgram and Build with APIs alongside model-specific documentation. Custom models help them tune transcription for field language, while Speech to Text keeps implementation focused on one API surface.

How does Deepgram work?

Connect your first audio source through Build with APIs, then choose Speech to Text or Voice Agent for the workflow you want to ship. Use the model-specific docs to match latency, accuracy, and language needs.
Configure transcription behavior with Flux or Nova: Transcription, then refine output using Smart formatting, Numerals, Filler words, and Diarization. This gives your app cleaner text for search, QA, or downstream automation.
Add Text to Speech when you need spoken responses, and tune delivery with Authentic, Natural Voices and Context-aware delivery. For agentic flows, use the Unified Voice Agent API and Conversational control.
Route the workload through Cloud, Self-Hosted, or Dedicated deployment options based on your security and infrastructure needs. Keep the same API surface while scaling usage and preserving control.
Expand into Audio Intelligence features like Summarization, Sentiment analysis, Intent recognition, and Topic detection. Use Integrate Deepgram with partners such as Twilio, livekit, or OpenAI to fit existing systems.

How much does Deepgram cost?

Pay As You Go

Free $200 Credit

No minimums. No expiration.
No credit card required.
All endpoint in public models
Up to 50 for the REST API
Up to 5 for Deepgram Whisper Cloud
Up to 45 for the WSS API
Up to 10 for the REST API
Community & Discord
Standard Uptime

Growth

$4K+ / year

With pre-paid credits for the year.
Credits are redeemed against actual usage.
All endpoints in public models
Up to 50 for the REST API
Up to 5 for Deepgram Whisper Cloud
Up to 60 for the WSS API
Up to 10 for the REST API
Community & Discord
Standard Uptime

Enterprise

Contact Sales

For businesses with large volumes, data or deployment requirements, or support needs.

Speech to Text: Flux English

$0.0065/min

Conversational speech recognition for real-time voice agents with built-in turn detection, natural interruption handling and ultra-low latency.

Speech to Text: Flux Multilingual

$0.0078/min

Conversational speech recognition for real-time voice agents that handle multiple languages within a single conversation, with built-in turn detection, natural interruption handling and ultra-low latency.

Speech to Text: Nova-3 Monolingual

$0.0048/min

Our highest performing model. Recommended for most use cases, especially audio with multiple languages, background noise, crosstalk and far field audio.

Speech to Text: Nova-3 Multilingual

$0.0058/min

Our highest-accuracy multilingual model with automatic language detection. Recommended for audio with multiple languages, background noise, crosstalk and far-field input.

Speech to Text: Custom

Contact Sales

Custom speech-to-text models trained on proprietary or novel datasets for maximum accuracy in edge-case scenarios.

Speech to Text Add-ons: Redaction

$0.0020/min

Automatically identify and remove sensitive PII such as social security numbers, credit cards and phone numbers.

Speech to Text Add-ons: Keyterm Prompting

$0.0013/min

Boost accuracy for specific field-specific jargon, product names, or acronyms important to your use case.

Speech to Text Add-ons: Smart Formatting

Included

Automatically format punctuation, casing, dates and currency for readability.

Speech to Text Add-ons: Speaker Diarization

$0.0020/min

Detect multiple speakers and label who spoke when in the transcript.

Voice Agent API: Standard

$0.075/min

Calculated based on websocket connection time.

Voice Agent API: Standard - BYO TTS

$0.065/min

Calculated based on websocket connection time.

Voice Agent API: Custom - BYO LLM

$0.056/min

Calculated based on websocket connection time.

Voice Agent API: Custom - BYO LLM + TTS

$0.050/min

Calculated based on websocket connection time.

Voice Agent API: Advanced

$0.163/min

Calculated based on websocket connection time.

Voice Agent API: Advanced - BYO TTS

$0.122/min

Calculated based on websocket connection time.

Frequently asked questions

What is Deepgram?

How much does Deepgram cost? Is it free?

Deepgram has a free plan, with paid tiers including Growth at $4K+ / year, Enterprise at Contact Sales, Speech to Text: Flux English at $0.0065/min.

What is Deepgram used for? Who is it for?

Deepgram is used for Speech to Text, Text to Speech, and Voice Agent. It's built for Product engineers, Voice AI teams, and Contact center builders.

Does Deepgram have an API and what does it integrate with?

Deepgram offers a speech-to-text API with model-specific docs and feature docs for transcription capabilities. It integrates with Twilio, daily, Granola, vapi, livekit, and 13 more.

Editor's read

Check whether your workload needs self-hosted or dedicated deployment before committing. Those options are positioned for larger deployment, data, or support requirements, so confirm the plan and operational path match your infrastructure and compliance needs.

Filed under:Voice AI Agents freemium gdpr hipaa self-hosted soc2

Explore other Voice AI Agents

Browse Voice AI Agents

Synthflow

AI voice automation for phone and chat workflows.

Voice AI Agents

Synthflow automates phone and chat workflows with AI agents, Flow Designer, and Test Calls. Plans start free, with Enterprise custom.

Vapi

Voice-agent deployment with orchestration, monitoring, and low latency.

Voice AI Agents

Vapi deploys voice agents with ultra low latency, live call data access, and API/SDK support. Plans start with Build usage based.

Bland AI

AI phone and messaging platform for voice, SMS, iMessage, and web chat.

Voice AI Agents

Bland AI automates voice, SMS, iMessage, and web chat conversations. Plans start at $0.14/min, with Build, Scale, and Enterprise options.

Retell AI

Voice automation for phone calls with routing, booking, and call handling.

Voice AI Agents

Retell AI automates phone calls with IVR routing, appointment booking, and CRM syncing. Plans start Free, with Enterprise custom.

Voiceflow

Build chat and voice AI agents without code using Voiceflow

No-Code/Low-Code BuildersVoice AI Agents

Voiceflow is a no-code platform for building AI agents with drag-and-drop tools, multi-LLM support, and omnichannel deployment for chat and voice.