TruLens

What is TruLens?

TruLens is an open-source AI evaluation tool for teams that measures agent quality by tracing execution flow and scoring retrieved context, tool calls, plans, and outputs. Its Evaluate, Iterate, and Test workflow helps teams compare versions and catch regressions with groundedness, context relevance, coherence, and answer relevance. It works through the Python SDK or OpenTelemetry traces and is used by teams at Equinix, Snowflake, and KBC Group.

Last verifiedMay 17, 2026How we evaluate

Visit TruLens

At a glance

Best for: TruLens is best for AI teams who need trace-level evaluation of agent behavior across versions.
API: Yes — TruLens can be used via the Python SDK or by ingesting OpenTelemetry traces.

What does TruLens do?

TruLens measures AI agent quality by tracing execution flow and scoring the parts that matter: retrieved context, tool calls, plans, and downstream outputs. Its Evaluate, Iterate, and Test workflow helps teams compare versions, inspect trace-level regressions, and tune prompts or hyperparameters with metrics like groundedness, context relevance, coherence, and answer relevance. The result is a faster path from experimentation to production-ready agent behavior. At scale, TruLens is used by thousands of developers and is trusted by teams at Equinix, Snowflake, KBC Group, tribble.ai, CubeServ, and Datec. It works with any AI agent through the Python SDK or by ingesting OpenTelemetry traces, so it fits into an existing observability stack instead of forcing a new one. The project is community-driven open source and shepherded by Snowflake, which gives it a public, inspectable development model for teams that want credible LLM apps without a closed black box.

Why use TruLens?

OpenTelemetry support lets it fit into existing observability pipelines without a separate tracing format.
Thousands of developers use it, and named adopters include Equinix, Snowflake, KBC Group, tribble.ai, CubeServ, and Datec.

Who is TruLens for?

AI engineers who need to compare agent versions and catch trace-level regressions.
ML teams who want metrics for groundedness, relevance, and other quality signals.
Platform teams who need observability-friendly tracing for agent workflows.
Product teams shipping RAG or summarization apps that need faster iteration loops.

What are TruLens's key features?

Evaluate

Run evals on agent outputs through the Python SDK or OpenTelemetry traces, so teams can score behavior against repeatable checks before shipping changes.

Iterate

Compare trace-backed runs and refine prompts or workflows faster, using the same Python SDK data to see what changed and why.

Test

Build test cases for agent behavior and replay them against captured traces, helping catch regressions before they reach production systems.

Interoperable tracing

Ingest OpenTelemetry traces and keep observability data portable across tools, which matters when teams already standardize on OpenTelemetry.

Scalable, trusted evals

Support evaluation workflows at larger scale with trace ingestion and Python SDK access, giving teams a consistent way to trust results across many runs.

What does TruLens integrate with?

OpenTelemetry

What are TruLens's use cases?

Agent version regression checks

AI engineers use TruLens to compare agent versions and catch trace-level regressions before they ship. They lean on Evaluate to score outputs consistently and Interoperable tracing to inspect where a new prompt or tool call changed behavior.

Grounded RAG quality review

ML teams use TruLens to measure groundedness and relevance in RAG pipelines, then use Test to validate changes against quality signals. Scalable, trusted evals helps them keep review results consistent as datasets and experiments grow.

Observability for agent workflows

Platform teams use TruLens to add observability-friendly tracing to agent workflows and debug failures across services. With Interoperable tracing and Iterate, they can follow a request end to end and shorten the time from incident to fix.

Faster summarization iteration

Product teams shipping summarization apps use TruLens to tighten their iteration loop on output quality. They use Iterate and Evaluate to spot weak summaries quickly, then Test to confirm the next release improves relevance without introducing new errors.

How does TruLens work?

Download the Python SDK or connect OpenTelemetry traces to start capturing agent activity in TruLens. This gives you the first trace stream to inspect without changing your whole stack.
Open the documentation to wire up Evaluate and Interoperable tracing for your workflow. Map the spans, prompts, and outputs you want to score so each run is measurable.
Run Test on a small set of examples to establish a baseline for groundedness, relevance, and other quality signals. Use the results to spot regressions early.
Iterate on prompts, tools, or retrieval settings, then re-run Evaluate to compare versions side by side. Keep the best-performing configuration as your working release.
Use the community and ongoing evals to expand coverage as your app grows. Scalable, trusted evals helps teams keep checks reliable across more traces and releases.

Frequently asked questions

What is TruLens?

What is TruLens used for? Who is it for?

TruLens is used for Evaluate, Iterate, and Test. It's built for AI engineers, ML teams, and Platform teams.

Does TruLens have an API and what does it integrate with?

TruLens can be used via the Python SDK or by ingesting OpenTelemetry traces. It integrates with OpenTelemetry.

Filed under:Agent Tools & Integrations open-source

Explore other Agent Tools & Integrations

Browse Agent Tools & Integrations

AgentMail

Email API for AI agents that handles inboxes and threaded replies.

Agent Tools & Integrations

AgentMail is an email API for AI agents with threaded replies, semantic search, and data extraction. Plans start at Free $0/month.

AgentPhone

Phone infrastructure for AI agents handling calls and texts.

Agent Tools & Integrations

AgentPhone routes calls and texts through one webhook, with real-time transcription and Native MCP support. Plans start at $3/per number/month.

AgentQL

Query-based web extraction and automation for changing sites.

Agent Tools & Integrations

AgentQL turns web pages into structured outputs with queries, plus a debugger and browserless API. Starter is $0/monthly; Professional is $99/monthly.

Agentverse

Browse agents, filter results, and chat in one marketplace.

Agent Tools & Integrations

Agentverse is an agent marketplace with chat, filters, and 2.81M agents for browsing ready-made workflows.

Apify

Web data automation with Actors, proxies, and integrations.

Agent Tools & Integrations

Apify scrapes websites and processes data with Actors, Proxies, and integrations. Plans start at Free $0 and Starter $29/month.