Langfuse
Langfuse is an open-source LLM observability platform for tracing, evaluation, and iteration, with self-hosted and cloud deployment options.
Reviewed by Mathijs Bronsdijk · Updated Apr 13, 2026

What is Langfuse?
Langfuse is an open-source LLM engineering platform for observability, tracing, evaluation, and iteration in LLM applications. It captures traces for LLM calls, embeddings, retrieval steps, and multi-turn sessions, and shows agent flows as graphs. It also includes prompt management, LLM-as-a-Judge evaluations, metrics, an LLM playground for testing changes, and custom dashboards. Langfuse is for developers and enterprise teams that need to debug, monitor, and improve LLM apps in production. It stands apart from general application performance monitoring tools because it focuses on LLM-specific data such as token usage, model parameters, and evaluation scores, and it supports both self-hosted and cloud deployment.
Key Features
- LLM Application Observability: Langfuse captures traces from LLM calls, retrieval, embeddings, and agent actions, so teams can track latency, costs, token usage, and debug production issues across complex logs and user sessions.
- Prompt Management: Prompt Management gives teams a central place to version, update, and reuse prompts with server-side and client-side caching, which helps keep prompt changes consistent across development and production without adding latency during experiments.
- LLM Playground: LLM Playground lets users test prompts and model settings directly from traces, which shortens the path from a detected issue to a live prompt experiment.
- Sessions: Sessions groups observations across traces into replayable user sessions with sharing, bookmarking, annotations, and session-level scoring through the UI, SDK, or API, so teams can review behavior over time instead of one trace at a time.
- Evaluations: Evaluations supports LLM-as-a-Judge methods, custom scores, and datasets, so Langfuse users can test output quality in a structured way and connect results back to traces and prompts.
- Langfuse SDKs: Langfuse SDKs include Python v3 and JS/TS v4, plus OpenTelemetry support for other languages, so teams can instrument traces, prompts, observations, and evaluations in existing apps.
- Queued Trace Ingestion: Queued Trace Ingestion batches traces to S3 with Redis queuing before worker processing into ClickHouse, which helps avoid timeouts during load spikes and supports production-scale usage.
- Complete API: Complete API includes an OpenAPI spec, Postman collection, and typed SDKs for custom workflows such as scores and session management, which is useful for teams that need Langfuse data inside their own tools or processes.
Use Cases
-
Co-founder at an AI design tool startup: Uses Langfuse for live debugging during support tickets by checking prompts, responses, cost, and latency in the trace viewer. Customer support resolutions average under 8 minutes.
-
Head of Product at a design platform: Uses Langfuse to review traces from AI help features with both engineers and non-technical domain experts. The team uses shared filtered views to support cross-functional prompt optimization.
-
Staff Software Engineer at an edtech nonprofit: Instruments tutoring feature LLM calls with the Langfuse SDK and reviews latency and quality during development cycles. Developers get extremely fast feedback on AI implementations.
-
Chief Data and AI Officer at a pharmaceutical company: Uses Langfuse across LLM deployments to track prompts, responses, costs, and latencies in real-time dashboards and historical traces. The team uses that data for audit and optimization work.
-
Head of Operations Data and AI at a payments company: Builds support AI deflection workflows on Langfuse traces and measures results over time. The company saves 30% of external BPO costs by deflecting 50% of support conversations to AI.
Strengths and Weaknesses
Strengths:
- Product Hunt users (41 reviews, date not provided) rate Langfuse 5.0 and praise its observability features and integrations.
Weaknesses:
- Public review data in the research is limited. G2 lacks review data for Langfuse, and the available sentiment summary only cites Product Hunt (date not provided).
Pricing
- Hobby: $0 forever. Includes traces, observations, and scores tracking, 30 days data retention, 2 users, 1,000 requests/min, and community support. Limited to 50,000 units/month, with no overage.
- Core: $29/month. Includes everything in Hobby, 90 days data access, unlimited users, and in-app support. Includes 100,000 units/month, then overage starts at $8 per 100,000 additional units, with lower per-unit rates at higher volume.
- Pro: $199/month. Includes everything in Core, 3 years data access, data retention management, unlimited annotation queues, high rate limits, SOC2 and ISO27001 reports, BAA available (HIPAA), and prioritized in-app support. Includes 100,000 units/month, with the same graduated overage model as Core.
- Enterprise: $2,499+/month. Includes everything in Pro, audit logs, SCIM, SSO, 1,000 API requests/min, 20,000 ingestion requests/min, 99.9% uptime SLA, a dedicated support engineer, and custom contracts. Includes 100,000 units/month, with the same overage model and custom limits available.
Self-hosting is available for free through the open source version at all plan levels. Langfuse also lists discount programs for startups, EDU, and OSS.
Who Is It For?
Ideal for:
- Engineering manager or individual contributor building core LLM features at a mid market or enterprise company: Langfuse fits teams where multi step LLM calls and agent workflows are central to the product. It helps with debugging and iteration through traces that show nested interactions.
- Technical product manager at a growth or scale up company overseeing AI integrations: It suits product leads who need prompt versioning, A/B testing, and shared workflows for changing LLM app behavior. It can reduce full dependence on engineering for day to day prompt operations, but it still assumes some technical skill.
- ML engineer or data scientist working on agentic apps in mid market or enterprise settings: Langfuse is a match for production LLM apps that need observability, token and cost tracking, evaluations such as LLM as a Judge, and datasets for repeatable testing.
Not ideal for:
- Non technical business users or marketers: Langfuse requires coding and does not target no code use cases, so tools like Retool AI or Bubble are a better fit.
- Teams building a simple chatbot or one off prototype: If the LLM feature is not core and does not need agent graphs or evaluations, a lighter option such as the OpenAI playground or Anthropic playgrounds is likely enough.
Langfuse is best for engineering led teams of 5 to 50 engineers at growth or scale up companies that treat LLM apps as a core product area. Use it when you need traces, evaluations, prompt management, and cost visibility for complex production systems. Skip it if you do not have in house developers or if you are only testing a basic MVP chatbot.
Alternatives and Comparisons
-
Braintrust: Langfuse does self-hosting better, with Docker and Kubernetes deployment, native OpenTelemetry support, and no per-seat pricing for more predictable team costs. Braintrust does managed production workflows better, with CI/CD deployment blocking, zero-code proxy traffic capture, and a free tier that includes 1M spans per month and unlimited users. Choose Langfuse if you need self-hosted control or data sovereignty; choose Braintrust if you want a managed setup with evals tied closely to production workflows. Switching difficulty from Braintrust is medium.
-
LangSmith: Langfuse does open-source and self-hosting better, with an MIT license, no enterprise fee for self-hosting, and broader integrations beyond LangChain. LangSmith does built-in evaluation and prompt tooling better, and public comparisons describe it as more approachable for non-technical users and stronger for teams already centered on LangChain. Choose Langfuse if open-source flexibility and lower-cost self-hosting matter more; choose LangSmith if your app stack is deeply tied to LangChain.
-
Helicone: Langfuse does full observability better, with native evaluations, annotation workflows, and agent visualization alongside tracing. Helicone does lightweight monitoring better through a zero-setup proxy model that tracks costs across multiple providers without adding SDKs to each service. Choose Langfuse if you need observability plus eval workflows in one tool; choose Helicone if you want fast proxy-based monitoring with less setup.
Getting Started
Setup:
- Signup: Langfuse supports email-only signup, has a free trial available, does not require a credit card, and supports team signup.
- Time to first result: Public information points to an onboarding wizard, then creating a project for public and secret keys and integrating the SDK, with first results in about 5 to 10 minutes.
Learning curve:
- The initial setup is light if you already work with SDKs and have basic Python or Node.js coding skills. Full LLM engineering work takes more setup beyond basic tracing.
- Beginner: Day 1 for basic tracing. Experienced: immediate for traces, 1 to 2 days for evals.
Where to get help:
- Official help starts with the docs, getting started guide, and FAQ. The hello world path is minimal interaction, but there are no sample templates listed.
- GitHub Discussions is described as the best place to ask questions and give feedback, and answers come from a mix of maintainers, staff, and community members. Weekly Community Hours on Google Meet give real-time help during scheduled calls.
- In-app support routes requests by email, and enterprise users can get Slack Connect or MS Teams access with a stated response SLA within 24 hours. The wider community appears small to medium, organized, and growing.
Watch out for:
- The tracing step in setup expects you to generate traces from your own app, and there is no sample or playground in the onboarding flow.
- Users report confusion and idle waiting during early setup.
Integration Ecosystem
Users describe Langfuse as broad and framework agnostic, with public documentation and user reports pointing to connections across 80+ tools and strong fit for multi-library stacks. They often say the integrations work reliably, with positive notes on setup, detailed analytics, and production use. Its integration approach is API-first, and the research data does not note an MCP server.
- LangChain: Users praise the LangChain integration for tracing and monitoring chains and agents, with native support that helps debug production issues.
- OpenAI SDK: Users often highlight real-time token usage, cost tracking, and output logging, and they frequently mention that setup is easy.
- OpenTelemetry: Users praise OpenTelemetry support for exporting traces from Python and TypeScript stacks into Langfuse for custom dashboards and analytics.
- LlamaIndex: Users say it works well for observability across RAG pipelines, embeddings, and retrieval calls, with detailed latency breakdowns.
- Vercel AI SDK: Users report smooth tracing in TypeScript apps and say it fits full-stack LLM monitoring needs.
The research data does not list any commonly requested missing integrations. We also did not find user-reported gaps that stood out across the sources provided.
Developer Experience
Langfuse has a developer-focused surface with SDKs for Python, TypeScript and Node.js, lightweight client-side JavaScript, and a REST API for tracing prompts and responses, evals, and datasets in LLM apps. Public feedback describes the docs as excellent, with clear quickstarts, schema references, and integration guides that map well to production use. Reports suggest a basic chain can be instrumented and visible in the dashboard in 5 to 15 minutes, and one developer said OpenAI calls were logging in under 10 minutes with the Python quickstart.
What developers like:
- Developers often praise the Python and TypeScript SDKs for feeling polished and for having good error handling.
- Public feedback highlights simple integration, including "drop-in decorator" workflows, plus a real-time dashboard with cost breakdowns.
- The open-source core is a recurring positive point, and some developers note it is easy to fork or extend.
Common frustrations:
- Some developers report hitting rate limits quickly on the cloud free tier during testing.
- Public issue threads mention occasional SDK bugs around nested traces or custom scorers.
- Self-hosting with Docker can be finicky on ARM systems, including M1 Macs.
Security and Privacy
- Certifications: Langfuse states that it is SOC 2 Type 2 certified and ISO 27001 certified, per its security page.
- GDPR: The vendor states GDPR compliance, per its security page.
- Encryption: Langfuse states that data is encrypted at rest with AES-256 and in transit with TLS 1.2+, per its security page.
- Access control: The vendor states support for role-based access control, multi-factor authentication via TOTP, WebAuthn, and SMS, and SAML SSO with Google, GitHub, and Azure, per its security page.
- Data training: Langfuse states that it does not train on user data, per its security page.
- HIPAA: The vendor states it is not HIPAA compliant, and a BAA is available, per its security page.
Product Momentum
- Release pace: Public activity points to weekly releases and rapid iteration, and the roadmap is maintained in public.
- Recent releases: The project shipped Web release v3.167.0 on April 9, 2026. Public activity on April 9, 2026 also shows dependency updates, including package-wait to 5-day intervals.
- Growth: Momentum appears stable, and Langfuse is a YC W23-backed open-source project with integrations across OpenTelemetry, Langchain, OpenAI SDK, and LiteLLM.
- Search interest: Google Trends direction is unknown, with +0.0% change between the first and second half of the period, a latest score of 0/100, and a peak score of 0/100.
- Risks: No notable viability risks appear in the research. Minor UI critiques show up in roadmap discussions, and the project’s open-source, multi-repo activity suggests low abandonment risk.
FAQ
What is Langfuse used for?
Langfuse is an open-source observability platform for AI applications. It covers tracing, evaluation, and prompt management for teams that monitor, debug, and improve LLM applications.
How much does Langfuse cost?
Self-hosted Langfuse is free under the MIT license. Langfuse Cloud uses usage-based pricing, and June 2025 updates added automatic volume discounts on tracing units.
Is Langfuse completely free?
The open-source self-hosted version is free and includes core observability features. Langfuse Cloud also has a free tier with usage limits, and paid usage starts above that allowance.
Does Langfuse have a free tier on Cloud?
Yes. Langfuse Cloud includes a free tier with usage limits, and users pay for usage above the free allowance.
Can Langfuse be self-hosted?
Yes. Langfuse is fully open-source and self-hostable under the MIT license, and self-hosters can run version 3.65.0 or later to access all product features at no cost.
What features are included in the free open-source version?
As of version 3.65.0 in June 2025, the self-hosted version includes the core platform, LLM-as-a-Judge evaluations, Playground, Prompt Experiments, Annotation and Data Labeling, S3 Exports, and PostHog Integration. These features are available under the MIT license.
Is Langfuse part of LangChain?
No. Langfuse is a separate open-source project, though it integrates with LangChain applications for tracing and observability.
What integrations does Langfuse support?
Langfuse integrates with LangChain, OpenAI, PostHog, and other tools. It also provides SDKs for Python and JavaScript or TypeScript.
How long does it take to get started with Langfuse?
Langfuse highlights fast onboarding, and the getting started flow points to a first result in 5 to 10 minutes. Teams begin by creating a project, getting public and secret keys, and integrating the SDK.
What is the difference between Langfuse Cloud and self-hosted Langfuse?
Langfuse Cloud is a managed service with usage-based pricing and HIPAA compliance options. Self-hosted Langfuse is free under the MIT license and gives teams full control over their data.
Does Langfuse comply with healthcare regulations?
Yes. Langfuse offers HIPAA-compliant Cloud instances for healthcare organizations.
What data privacy and security features does Langfuse offer?
Langfuse offers HIPAA-compliant Cloud instances for regulated use cases, and self-hosted deployments give teams full data control. The security summary also states encryption at rest with AES-256.
What is the Langfuse Python SDK v3?
Langfuse Python SDK v3 became generally available for production use in June 2025. Langfuse also provides an upgrade guide for teams on earlier versions.
What are common troubleshooting issues with Langfuse?
Common issues include authentication errors, missing traces, incorrect span nesting, and LangChain or OpenAI integration problems. Langfuse provides troubleshooting guidance for both Python and JS or TS SDKs.
Does Langfuse have recent prompt management updates?
Yes. In June 2025, Langfuse added array insertion to Prompt Management, which supports arrays of messages at any point within chat prompts.