LangWatch

LangWatch is an open-source LLMOps platform for testing, evaluation, and observability of AI agents in development and production.

Reviewed by Mathijs Bronsdijk · Updated Apr 13, 2026

ToolFree + Paid PlansUpdated 1 month ago

Visit LangWatch

What is LangWatch?

LangWatch is an open-source LLMOps platform for monitoring, evaluating, and testing AI agents in development and production. It captures full traces of LLM calls, tool usage, and conversation flows, giving teams visibility into what their agents are actually doing. LangWatch is built for AI engineers and product teams at growth-stage companies shipping multi-step agent workflows. Where most observability tools stop at logging, LangWatch adds agent simulations, automated evaluations, prompt versioning, and cost tracking in a single platform.

Key Features

Full Trace Observability: Captures end-to-end traces across multi-step agent workflows, logging inputs, outputs, metadata, tokens, and costs with real-time drill-down for debugging failures
Agent Simulation Testing (LangWatch Scenario): Creates and runs simulation tests directly in the platform without code, defining user personas and pass/fail criteria against HTTP endpoints or voice agents
Prompt Playground: Write, test, and version prompts connected to real trace data and evaluations, linking prompt changes directly to observed performance impacts
Graph Alerts: Triggers Slack notifications when custom thresholds are met on metrics like latency or error rates, catching production issues before users notice
Annotations and Queues: Domain experts manually label edge cases in traces, combining human oversight with automated evals to target improvements on rare failure modes
GitHub Integration: Links prompt versions to Git commits and tags traces for GitOps workflows, treating prompts as versioned code with reproducible debugging
Self-Hosting via Docker: Deploys with a single Docker Compose command for data residency in VPCs, suitable for regulated sectors needing full infrastructure control
RBAC Custom Roles: Enforces fine-grained access control with customizable role sets, limiting eval or production workflow access by team domain

Use Cases

AI engineering teams shipping multi-step agents: Run daily scenario tests in CI/CD to catch prompt regressions and hallucinations before they reach production
B2B platform builders: Track per-customer AI performance with customer_id filtering and provide custom observability dashboards to clients running LLM apps
QA leads debugging agent failures: Use trace filtering, labels, and cross-team annotations to isolate root causes in black-box agent workflows
MLOps engineers in production: Monitor cost attribution, quality signals, and user event feedback across OpenTelemetry-instrumented agent pipelines

Strengths and Weaknesses

Strengths:

Combines monitoring, evaluations, and experimentation in one platform instead of stitching together separate tools
Python SDK is lightweight with good async support; TypeScript SDK is type-safe and simple for Node apps
Scenario testing with simulated users catches regressions that unit tests miss in non-deterministic agent behavior
ISO 27001 certified and SOC 2 Type II compliant with self-hosting option for data sovereignty
Free tier includes all platform features with 50,000 events per month

Weaknesses:

Free tier rate limits can hit during heavy testing cycles
Error messages for authentication and missing spans could be more specific
Some SDK breaking changes (v1.2) required decorator rewrites with limited migration guidance
Documentation is practical for setup but thin on advanced evaluation configurations
Community presence is limited compared to alternatives like Langfuse

Pricing

Developer (Free): $0 forever, all platform features, 50,000 events/month, 14-day data retention, 2 users, 1 GB storage, 3 scenarios/simulations/custom evaluations
Growth: ~$34/core seat/month (EUR 29), 200,000 events included then EUR 0.0005 per extra event, 30-day retention, unlimited lite-users, unlimited evals/simulations/prompts, private Slack/Teams support, 14-day free trial
Enterprise: Custom pricing, on-prem or hosted deployment, OpenTelemetry support for Java/Go/custom, custom metrics and dashboards, premium support

Volume discounts available above 20 Growth seats.

FAQ

Is LangWatch open source?

Yes. LangWatch is fully open-source and available on GitHub. Teams can self-host it locally or in their own VPC with no data lock-in, and export all data at any time.

How does LangWatch work?

LangWatch automatically tracks every LLM call, tool usage, and user interaction with detailed traces, spans, and metadata. It layers evaluations, agent simulations, prompt experiments, and dashboards for cost and quality metrics on top of that observability data.

Is LangWatch safe for regulated industries?

LangWatch holds ISO 27001 and SOC 2 Type II certifications. It supports VPC self-hosting to keep sensitive traces and datasets private, offers data residency in EU and US regions, and encrypts data at rest (AES-256) and in transit (TLS 1.2+).

How does LangWatch compare to Langfuse?

Both are open-source LLM observability platforms. LangWatch bundles monitoring, evaluations, and experimentation into one tool. Langfuse focuses more on self-managed observability with a strong open-source community. Choose LangWatch if you need integrated scenario testing and prompt experiments alongside tracing.

How long does it take to set up LangWatch?

Most teams report logging their first traces within 10 to 20 minutes. Setup requires adding an API key and installing the Python or TypeScript SDK. No credit card is needed for the free tier.

Categories:

Agent Tools & Integrations

Tags:

ci-cd free free-trial github open-source self-hosted slack

Explore other Agent Tools & Integrations

Browse Agent Tools & Integrations

Crawl4AI

Open-source web crawler that turns websites into clean Markdown for LLMs, RAG, and AI agents.

Agent Tools & Integrations

Crawl4AI is a Python library with 63k+ GitHub stars that converts web pages into structured, LLM-ready Markdown. It handles JavaScript rendering, parallel crawling, anti-bot detection, and structured data extraction for AI pipelines and agent frameworks.

Credo AI

AI governance and compliance built for enterprise accountability

Agent Tools & Integrations

Credo AI helps enterprises manage responsible AI governance, track compliance, and uphold ethical AI standards across their organization.

Cuey

Compare answers from 30+ AI models in one browser workflow.

Agent Tools & Integrations

Cuey compares answers from 30+ AI models with side-by-side views and context preservation. Free $0 forever; Pro $9.99/month.

Datadog Bits AI

AI copilot for observability, incident response, and remediation

Agent Tools & Integrations

Datadog Bits AI helps ops teams investigate incidents faster with observability workflows and devops software automation.

Daytona

Safe, stateful sandboxes for AI-generated code in under 90ms

Agent Tools & Integrations

Daytona runs AI-generated code in isolated, stateful sandboxes that launch in under 90ms and support cloning, snapshots, pause, and archive.