LiteLLM
What is LiteLLM?
LiteLLM is an OpenAI-compatible gateway for platform teams that routes LLM requests across many providers without rewriting app integrations. It includes Model Access, LLM Fallbacks, Spend Tracking, Budgets & Rate Limits, Virtual Keys, and LLM Observability. It integrates with Langfuse, Arize Phoenix, Langsmith, OTEL, Datadog, and OpenTelemetry, and is used by Netflix and Lemonade. Plans run Open Source $0 and Enterprise custom.
Last verifiedHow we evaluate
At a glance
- LiteLLM is best for platform teams who need one gateway for multi-model access, spend control, and fallbacks.
- Open Source $0; Enterprise Get In Touch
- 30 days, no credit card
What does LiteLLM do?
LiteLLM routes model requests through an OpenAI-compatible gateway so teams can point apps at many providers without rewriting every integration. It handles model access, fallbacks, spend tracking, budgets, rate limits, guardrails, prompt management, and pass-through endpoints, while also supporting batches and logging to s3. The result is a single control plane for developers and platform teams that need to standardize how LLM traffic is sent, measured, and governed. At scale, LiteLLM says it has served 1B+ requests, seen 242M+ docker pulls, and maintained 96% uptime with 1,010+ contributors. It supports 100+ LLMs and 100+ provider integrations across OpenAI, Azure, Bedrock, and GCP, and it can be deployed self-hosted, on-prem, or in your cloud. Netflix and Lemonade are named users, and the product also connects with observability tools like Langfuse, Arize Phoenix, Langsmith, OTEL, Datadog, and OpenTelemetry.
Why use LiteLLM?
- OpenAI-compatible endpoints let teams switch providers without rebuilding every application integration.
- Usage controls like budgets, rate limits, and virtual keys help teams govern spend at the gateway layer.
- Observability hooks into tools like Langfuse, Datadog, and OpenTelemetry make production monitoring easier to centralize.
- Named users like Netflix and Lemonade show the product is used for real multi-model operations at scale.
- The platform reports 1B+ requests served and 96% uptime, which signals operational maturity for production traffic.
Who is LiteLLM for?
- Platform teams that need to standardize LLM access across many providers.
- Engineering leaders who want usage controls, budgets, and rate limits around model traffic.
- Developers shipping AI features who need OpenAI-compatible endpoints and fallback routing.
- Organizations with self-hosting requirements that want to run the gateway on-prem or in their cloud.
- Teams operating production LLM workloads that need observability and auditability.
What are LiteLLM's key features?
Model Access
Route requests across OpenAI, Azure, Bedrock, and GCP through one API, so teams can switch providers without rewriting application code.
LLM Fallbacks
Set fallback paths across 100+ LLMs to keep requests moving when a provider fails, reducing downtime for production workloads.
Spend Tracking
Track usage and cost across 100+ LLM Provider Integrations, giving finance and engineering a shared view of model spend.
Budgets & Rate Limits
Apply budgets plus RPM/TPM limits to control usage by team or org, helping prevent surprise bills and runaway traffic.
LLM Observability
Send logs to Langfuse, Arize Phoenix, Langsmith, OTEL, Prometheus, Datadog, or OpenTelemetry to trace requests and debug model behavior.
OpenAI-Compatible
Expose an OpenAI-format gateway for 100+ LLMs, so existing SDKs and apps can connect with minimal code changes.
Virtual Keys
Issue virtual keys with RBAC and usage tracking by key, team, and org, making access control and billing attribution easier.
s3 Logging
Store request logs in s3 or gcs for later review and audit trails, which helps teams retain model activity outside the app.
What does LiteLLM integrate with?
- OpenAI
- Azure
- Bedrock
- Google Cloud
- s3
- gcs
- Langfuse
- Arize Phoenix
- Langsmith
- OTEL
- Prometheus
- Calendly
- Datadog
- OpenTelemetry
- Slack
- Discord
What are LiteLLM's use cases?
Platform teams standardize access
Platform teams use LiteLLM to route internal apps through one OpenAI-Compatible gateway, using Model Access to keep many providers behind a single interface. They can add LLM Fallbacks so production traffic keeps moving when a preferred model slows down or fails.
Engineering leaders control spend
Engineering leaders use LiteLLM to put guardrails around model usage, using Spend Tracking and Budgets & Rate Limits to cap runaway traffic before it becomes a surprise bill. They can also apply Virtual Keys to separate usage by team or service.
Developers ship resilient AI features
Developers shipping AI features use LiteLLM to call models through OpenAI-Compatible endpoints, so they can swap providers without rewriting application code. With LLM Fallbacks and Load Balancing, they keep user-facing features responsive during outages or spikes.
Production teams audit LLM traffic
Teams operating production LLM workloads use LiteLLM to monitor requests and investigate issues, using LLM Observability and s3 Logging to keep a durable record of model activity. That makes it easier to trace incidents, review usage, and support audits.
How does LiteLLM work?
- Connect your first model provider in Model Access, then point apps at LiteLLM's OpenAI-Compatible endpoint so existing SDKs can start sending traffic without code rewrites.
- Create Virtual Keys for teams, services, or environments, and use Budgets & Rate Limits to set guardrails before usage grows beyond what you planned.
- Turn on LLM Fallbacks and Load Balancing to route requests across providers, keeping responses flowing when one model is slow, unavailable, or over limit.
- Enable LLM Observability and Spend Tracking to watch request patterns, costs, and failures in one place, then review s3 Logging for durable records.
- Add JWT Auth, SSO, or Audit Logs for production controls, and keep tuning Budgets, Rate Limits, and Guardrails as your traffic changes.
How much does LiteLLM cost?
Open Source
$0- 100+ LLM Provider Integrations
- Langfuse, Arize Phoenix, Langsmith, OTEL Logging
- Virtual Keys, Budgets, Teams
- Load Balancing, RPM/TPM limits
- LLM Guardrails
Enterprise
Get In Touch- Everything in OSS
- Enterprise Support + Custom SLAs
- JWT Auth, SSO, Audit Logs
- All Enterprise Features - Docs
Frequently asked questions
What is LiteLLM?
LiteLLM is an OpenAI-compatible gateway for platform teams that routes LLM requests across many providers without rewriting app integrations. It includes Model Access, LLM Fallbacks, Spend Tracking, Budgets & Rate Limits, Virtual Keys, and LLM Observability. It integrates with Langfuse, Arize Phoenix, Langsmith, OTEL, Datadog, and OpenTelemetry, and is used by Netflix and Lemonade. Plans run Open Source $0 and Enterprise custom.
How much does LiteLLM cost? Is it free?
LiteLLM has a free plan, with paid tiers including Enterprise at Get In Touch. A 30-day free trial is available.
What is LiteLLM used for? Who is it for?
LiteLLM is used for Model Access, LLM Fallbacks, and Spend Tracking. It's built for Platform teams that need to standardize LLM access across many providers, Engineering leaders, and Developers shipping AI features.
Does LiteLLM have an API and what does it integrate with?
LiteLLM doesn't publish a public API. It integrates with OpenAI, Azure, Bedrock, Google Cloud, s3, and 11 more.
Editor's read
Check whether the Open Source tier covers the governance features you need, since Enterprise adds JWT auth, SSO, and audit logs. If those controls are required for your rollout, the free tier alone will not be enough.
