LiteLLM

What is LiteLLM?

LiteLLM is an OpenAI-compatible gateway for platform teams that routes LLM requests across many providers without rewriting app integrations. It includes Model Access, LLM Fallbacks, Spend Tracking, Budgets & Rate Limits, Virtual Keys, and LLM Observability. It integrates with Langfuse, Arize Phoenix, Langsmith, OTEL, Datadog, and OpenTelemetry, and is used by Netflix and Lemonade. Plans run Open Source $0 and Enterprise custom.

Last verifiedMay 17, 2026How we evaluate

Visit LiteLLM

At a glance

Best for: LiteLLM is best for platform teams who need one gateway for multi-model access, spend control, and fallbacks.
Pricing: Open Source $0; Enterprise Get In Touch
Free trial: 30 days, no credit card

What does LiteLLM do?

LiteLLM routes model requests through an OpenAI-compatible gateway so teams can point apps at many providers without rewriting every integration. It handles model access, fallbacks, spend tracking, budgets, rate limits, guardrails, prompt management, and pass-through endpoints, while also supporting batches and logging to s3. The result is a single control plane for developers and platform teams that need to standardize how LLM traffic is sent, measured, and governed. At scale, LiteLLM says it has served 1B+ requests, seen 242M+ docker pulls, and maintained 96% uptime with 1,010+ contributors. It supports 100+ LLMs and 100+ provider integrations across OpenAI, Azure, Bedrock, and GCP, and it can be deployed self-hosted, on-prem, or in your cloud. Netflix and Lemonade are named users, and the product also connects with observability tools like Langfuse, Arize Phoenix, Langsmith, OTEL, Datadog, and OpenTelemetry.

Why use LiteLLM?

OpenAI-compatible endpoints let teams switch providers without rebuilding every application integration.
Usage controls like budgets, rate limits, and virtual keys help teams govern spend at the gateway layer.
Observability hooks into tools like Langfuse, Datadog, and OpenTelemetry make production monitoring easier to centralize.
Named users like Netflix and Lemonade show the product is used for real multi-model operations at scale.
The platform reports 1B+ requests served and 96% uptime, which signals operational maturity for production traffic.

Who is LiteLLM for?

Platform teams that need to standardize LLM access across many providers.
Engineering leaders who want usage controls, budgets, and rate limits around model traffic.
Developers shipping AI features who need OpenAI-compatible endpoints and fallback routing.
Organizations with self-hosting requirements that want to run the gateway on-prem or in their cloud.
Teams operating production LLM workloads that need observability and auditability.

What are LiteLLM's key features?

Model Access

Route requests across OpenAI, Azure, Bedrock, and GCP through one API, so teams can switch providers without rewriting application code.

LLM Fallbacks

Set fallback paths across 100+ LLMs to keep requests moving when a provider fails, reducing downtime for production workloads.

Spend Tracking

Track usage and cost across 100+ LLM Provider Integrations, giving finance and engineering a shared view of model spend.

Budgets & Rate Limits

Apply budgets plus RPM/TPM limits to control usage by team or org, helping prevent surprise bills and runaway traffic.

LLM Observability

Send logs to Langfuse, Arize Phoenix, Langsmith, OTEL, Prometheus, Datadog, or OpenTelemetry to trace requests and debug model behavior.

OpenAI-Compatible

Expose an OpenAI-format gateway for 100+ LLMs, so existing SDKs and apps can connect with minimal code changes.

Virtual Keys

Issue virtual keys with RBAC and usage tracking by key, team, and org, making access control and billing attribution easier.

s3 Logging

Store request logs in s3 or gcs for later review and audit trails, which helps teams retain model activity outside the app.

What does LiteLLM integrate with?

OpenAI
Azure
Bedrock
Google Cloud
s3
gcs
Langfuse
Arize Phoenix
Langsmith
OTEL
Prometheus
Calendly
Datadog
OpenTelemetry
Slack
Discord

What are LiteLLM's use cases?

Platform teams standardize access

Platform teams use LiteLLM to route internal apps through one OpenAI-Compatible gateway, using Model Access to keep many providers behind a single interface. They can add LLM Fallbacks so production traffic keeps moving when a preferred model slows down or fails.

Engineering leaders control spend

Engineering leaders use LiteLLM to put guardrails around model usage, using Spend Tracking and Budgets & Rate Limits to cap runaway traffic before it becomes a surprise bill. They can also apply Virtual Keys to separate usage by team or service.

Developers ship resilient AI features

Developers shipping AI features use LiteLLM to call models through OpenAI-Compatible endpoints, so they can swap providers without rewriting application code. With LLM Fallbacks and Load Balancing, they keep user-facing features responsive during outages or spikes.

Production teams audit LLM traffic

Teams operating production LLM workloads use LiteLLM to monitor requests and investigate issues, using LLM Observability and s3 Logging to keep a durable record of model activity. That makes it easier to trace incidents, review usage, and support audits.

How does LiteLLM work?

Connect your first model provider in Model Access, then point apps at LiteLLM's OpenAI-Compatible endpoint so existing SDKs can start sending traffic without code rewrites.
Create Virtual Keys for teams, services, or environments, and use Budgets & Rate Limits to set guardrails before usage grows beyond what you planned.
Turn on LLM Fallbacks and Load Balancing to route requests across providers, keeping responses flowing when one model is slow, unavailable, or over limit.
Enable LLM Observability and Spend Tracking to watch request patterns, costs, and failures in one place, then review s3 Logging for durable records.
Add JWT Auth, SSO, or Audit Logs for production controls, and keep tuning Budgets, Rate Limits, and Guardrails as your traffic changes.

How much does LiteLLM cost?

Open Source

100+ LLM Provider Integrations
Langfuse, Arize Phoenix, Langsmith, OTEL Logging
Virtual Keys, Budgets, Teams
Load Balancing, RPM/TPM limits
LLM Guardrails

Enterprise

Get In Touch

Everything in OSS
Enterprise Support + Custom SLAs
JWT Auth, SSO, Audit Logs
All Enterprise Features - Docs

Frequently asked questions

What is LiteLLM?

How much does LiteLLM cost? Is it free?

LiteLLM has a free plan, with paid tiers including Enterprise at Get In Touch. A 30-day free trial is available.

What is LiteLLM used for? Who is it for?

LiteLLM is used for Model Access, LLM Fallbacks, and Spend Tracking. It's built for Platform teams that need to standardize LLM access across many providers, Engineering leaders, and Developers shipping AI features.

Does LiteLLM have an API and what does it integrate with?

LiteLLM doesn't publish a public API. It integrates with OpenAI, Azure, Bedrock, Google Cloud, s3, and 11 more.

Editor's read

Check whether the Open Source tier covers the governance features you need, since Enterprise adds JWT auth, SSO, and audit logs. If those controls are required for your rollout, the free tier alone will not be enough.

Filed under:Agent Tools & Integrations free-trial freemium open-source self-hosted

Explore other Agent Tools & Integrations

Browse Agent Tools & Integrations

mcp.run

Enterprise AI connectivity with governed access and audit controls.

Agent Tools & Integrations

Mcp.run runs a standards-compliant MCP gateway with audit controls, OIDC identity support, and self-hosted or cloud-ready deployment.

Maxim AI

AI workflow platform for prompt testing, simulation, and monitoring.

Agent Tools & Integrations

Maxim AI tests prompts and agents with simulations, observability, and Bifrost gateway routing. Plans start free, then $29/seat/month.

Mastra

TypeScript framework for building, observing, and deploying AI agents.

Agent Tools & Integrations

Mastra is a TypeScript framework for AI agents with Observability, Studio, and Memory Gateway. Plans run Free, Pro custom, and Enterprise custom.

LLM Guard

Open-source filters for safer prompts and model outputs.

Agent Tools & Integrations

LLM Guard filters prompts and outputs with scanners, CPU inference, and model-agnostic support across Azure OpenAI, Bedrock, and Langchain.

LlamaIndex

Open-source document AI for turning files into agent-ready data.

Agent Tools & Integrations

LlamaIndex turns PDFs, scans, and forms into structured data with OCR and extraction. Plans run Free, Starter Custom, Pro Custom, and Enterprise Custom.