Skip to main content
Favicon of Deepgram

Deepgram

What is Deepgram?

Deepgram is a speech AI platform for product and engineering teams that turns audio into transcripts, analytics, and conversational responses through real-time and batch APIs. Its stack includes Speech to Text, Text to Speech, Voice Agent, Audio Intelligence, and features like diarization, smart formatting, and keyterm prompting. It integrates with Twilio, Vapi, LiveKit, and OpenAI, and powers customers such as Twilio, Cloudflare, Sierra, and NASA. Plans run Pay As You Go Free $200 Credit, Growth $4K+ / year, and Enterprise Contact Sales.

Last verifiedHow we evaluate

Screenshot of Deepgram website

At a glance

Best for
Deepgram is best for product and engineering teams that need low-latency voice APIs for transcription, synthesis, and agents.
Pricing
Pay As You Go Free $200 Credit; Growth $4K+ / year; Enterprise Custom
API
Yes — Deepgram offers a speech-to-text API with model-specific docs and feature docs for transcription capabilities.

What does Deepgram do?

Deepgram handles speech-to-text, text-to-speech, and voice-agent workflows through real-time and batch APIs, so teams can turn audio into transcripts, analytics, and conversational responses. Its speech stack includes Flux for live voice agents, Nova for transcription, and Audio Intelligence for extracting insights from conversational audio, while the voice-agent API ties speech recognition, orchestration, and synthesis into one flow. The product pages show built-in turn detection, natural interruption handling, diarization, smart formatting, and keyterm prompting as part of that pipeline. At scale, Deepgram says it powers millions of audio minutes each day for hundreds of enterprises and supports thousands of concurrent sessions. The company was founded in 2015 and positions the platform for cloud and self-hosted deployment, with dedicated options available for larger deployment or support needs. Customers and partners named on the site include Twilio, Cloudflare, Sierra, NASA, Five9, and Vapi, and the developer docs note model-specific transcription documentation plus feature docs for speech capabilities.

Why use Deepgram?

  • Deepgram combines speech recognition, synthesis, and orchestration in one stack, reducing the integration work of assembling separate vendors.
  • Its self-hosted option gives teams a deployment path for stricter infrastructure or data requirements.
  • The platform is built for scale, with support for thousands of concurrent sessions and millions of audio minutes processed each day.
  • Named customers like Twilio, Cloudflare, NASA, and Five9 show it fits both startup and enterprise voice workloads.
  • The API-first approach and model-specific docs make it easier for developers to move from experimentation to production.

Who is Deepgram for?

  • Product engineers who need to add real-time speech features without stitching together multiple vendors.
  • Voice AI teams who need conversational control, low latency, and synchronized speech-to-speech flows.
  • Contact center builders who need transcription and analytics for high-volume customer conversations.
  • Platform teams who need cloud or self-hosted deployment options for voice workloads.
  • Developers who need API-first speech tools with model-specific documentation.

What are Deepgram's key features?

Speech to Text

Transcribe audio with Deepgram's API across streaming and batch workflows, including diarization and smart formatting for cleaner transcripts at scale.

Text to Speech

Generate natural-sounding speech with sub-200ms streaming text-to-speech and 40+ English voices, useful for low-latency voice experiences.

Voice Agent

Build real-time voice agents with Flux models, built-in turn detection, natural interruption handling, and ultra-low latency for live conversations.

Audio Intelligence

Extract summaries, sentiment analysis, intent recognition, and topic detection from audio, helping teams turn calls into searchable operational data.

Build with APIs

Use Deepgram's speech-to-text API and model-specific docs to integrate transcription features directly into products and workflows.

Integrate Deepgram

Connect Deepgram with Twilio, Vapi, LiveKit, Genesys, Five9, Vonage, and OpenAI for call, agent, and app integrations.

Flexible deployment options

Choose cloud, self-hosted, or dedicated deployment to match security, reliability, and data-control requirements for enterprise speech workloads.

Custom models

Train field-tuned models for specialized speech tasks, including medical transcripts and contact centers, to improve accuracy on niche vocabulary.

What does Deepgram integrate with?

  • Twilio
  • daily
  • Granola
  • vapi
  • livekit
  • cloudfare
  • Amazon AWS
  • AudioCodes
  • Cognigy
  • Enterprise Bot
  • Five9
  • Genesys
  • Kore.ai
  • OneReach
  • Replicant
  • Vercel
  • Vonage
  • OpenAI

What are Deepgram's use cases?

Voice agents for product engineers

Product engineers use Deepgram to add real-time speech features without stitching together multiple vendors, using Voice Agent and Build with APIs to ship conversational experiences faster. They can pair Speech to Text with Text to Speech for a tighter loop between user input and spoken responses.

Contact center analytics

Contact center builders use Deepgram to transcribe high-volume customer conversations and surface what matters, using Audio Intelligence and Speech Analytics to capture sentiment, intent, and topics. Diarization helps separate speakers cleanly, making QA and coaching easier to act on.

Deployment choices for platform teams

Platform teams use Deepgram to run voice workloads in the environment that fits their architecture, using Flexible deployment options and Self-Hosted to meet security or control requirements. They can keep the same speech stack while choosing Cloud or Dedicated deployment paths.

API-first speech for developers

Developers use Deepgram to build speech workflows directly into their products, using Integrate Deepgram and Build with APIs alongside model-specific documentation. Custom models help them tune transcription for field language, while Speech to Text keeps implementation focused on one API surface.

How does Deepgram work?

  1. Connect your first audio source through Build with APIs, then choose Speech to Text or Voice Agent for the workflow you want to ship. Use the model-specific docs to match latency, accuracy, and language needs.
  2. Configure transcription behavior with Flux or Nova: Transcription, then refine output using Smart formatting, Numerals, Filler words, and Diarization. This gives your app cleaner text for search, QA, or downstream automation.
  3. Add Text to Speech when you need spoken responses, and tune delivery with Authentic, Natural Voices and Context-aware delivery. For agentic flows, use the Unified Voice Agent API and Conversational control.
  4. Route the workload through Cloud, Self-Hosted, or Dedicated deployment options based on your security and infrastructure needs. Keep the same API surface while scaling usage and preserving control.
  5. Expand into Audio Intelligence features like Summarization, Sentiment analysis, Intent recognition, and Topic detection. Use Integrate Deepgram with partners such as Twilio, livekit, or OpenAI to fit existing systems.

How much does Deepgram cost?

Pay As You Go

Free $200 Credit
  • No minimums. No expiration.
  • No credit card required.
  • All endpoint in public models
  • Up to 50 for the REST API
  • Up to 5 for Deepgram Whisper Cloud
  • Up to 45 for the WSS API
  • Up to 10 for the REST API
  • Community & Discord
  • Standard Uptime

Growth

$4K+ / year
  • With pre-paid credits for the year.
  • Credits are redeemed against actual usage.
  • All endpoints in public models
  • Up to 50 for the REST API
  • Up to 5 for Deepgram Whisper Cloud
  • Up to 60 for the WSS API
  • Up to 10 for the REST API
  • Community & Discord
  • Standard Uptime

Enterprise

Contact Sales
  • For businesses with large volumes, data or deployment requirements, or support needs.

Speech to Text: Flux English

$0.0065/min
  • Conversational speech recognition for real-time voice agents with built-in turn detection, natural interruption handling and ultra-low latency.

Speech to Text: Flux Multilingual

$0.0078/min
  • Conversational speech recognition for real-time voice agents that handle multiple languages within a single conversation, with built-in turn detection, natural interruption handling and ultra-low latency.

Speech to Text: Nova-3 Monolingual

$0.0048/min
  • Our highest performing model. Recommended for most use cases, especially audio with multiple languages, background noise, crosstalk and far field audio.

Speech to Text: Nova-3 Multilingual

$0.0058/min
  • Our highest-accuracy multilingual model with automatic language detection. Recommended for audio with multiple languages, background noise, crosstalk and far-field input.

Speech to Text: Custom

Contact Sales
  • Custom speech-to-text models trained on proprietary or novel datasets for maximum accuracy in edge-case scenarios.

Speech to Text Add-ons: Redaction

$0.0020/min
  • Automatically identify and remove sensitive PII such as social security numbers, credit cards and phone numbers.

Speech to Text Add-ons: Keyterm Prompting

$0.0013/min
  • Boost accuracy for specific field-specific jargon, product names, or acronyms important to your use case.

Speech to Text Add-ons: Smart Formatting

Included
  • Automatically format punctuation, casing, dates and currency for readability.

Speech to Text Add-ons: Speaker Diarization

$0.0020/min
  • Detect multiple speakers and label who spoke when in the transcript.

Voice Agent API: Standard

$0.075/min
  • Calculated based on websocket connection time.

Voice Agent API: Standard - BYO TTS

$0.065/min
  • Calculated based on websocket connection time.

Voice Agent API: Custom - BYO LLM

$0.056/min
  • Calculated based on websocket connection time.

Voice Agent API: Custom - BYO LLM + TTS

$0.050/min
  • Calculated based on websocket connection time.

Voice Agent API: Advanced

$0.163/min
  • Calculated based on websocket connection time.

Voice Agent API: Advanced - BYO TTS

$0.122/min
  • Calculated based on websocket connection time.

Frequently asked questions

What is Deepgram?

Deepgram is a speech AI platform for product and engineering teams that turns audio into transcripts, analytics, and conversational responses through real-time and batch APIs. Its stack includes Speech to Text, Text to Speech, Voice Agent, Audio Intelligence, and features like diarization, smart formatting, and keyterm prompting. It integrates with Twilio, Vapi, LiveKit, and OpenAI, and powers customers such as Twilio, Cloudflare, Sierra, and NASA. Plans run Pay As You Go Free $200 Credit, Growth $4K+ / year, and Enterprise Contact Sales.

How much does Deepgram cost? Is it free?

Deepgram has a free plan, with paid tiers including Growth at $4K+ / year, Enterprise at Contact Sales, Speech to Text: Flux English at $0.0065/min.

What is Deepgram used for? Who is it for?

Deepgram is used for Speech to Text, Text to Speech, and Voice Agent. It's built for Product engineers, Voice AI teams, and Contact center builders.

Does Deepgram have an API and what does it integrate with?

Deepgram offers a speech-to-text API with model-specific docs and feature docs for transcription capabilities. It integrates with Twilio, daily, Granola, vapi, livekit, and 13 more.

Editor's read

Check whether your workload needs self-hosted or dedicated deployment before committing. Those options are positioned for larger deployment, data, or support requirements, so confirm the plan and operational path match your infrastructure and compliance needs.

Share:

Sponsored
Favicon

 

  
 

Explore other Voice AI Agents

Favicon

 

  
  
Favicon

 

  
  
Favicon