TruLens
What is TruLens?
TruLens is an open-source AI evaluation tool for teams that measures agent quality by tracing execution flow and scoring retrieved context, tool calls, plans, and outputs. Its Evaluate, Iterate, and Test workflow helps teams compare versions and catch regressions with groundedness, context relevance, coherence, and answer relevance. It works through the Python SDK or OpenTelemetry traces and is used by teams at Equinix, Snowflake, and KBC Group.
Last verifiedHow we evaluate
At a glance
- TruLens is best for AI teams who need trace-level evaluation of agent behavior across versions.
- Yes — TruLens can be used via the Python SDK or by ingesting OpenTelemetry traces.
What does TruLens do?
TruLens measures AI agent quality by tracing execution flow and scoring the parts that matter: retrieved context, tool calls, plans, and downstream outputs. Its Evaluate, Iterate, and Test workflow helps teams compare versions, inspect trace-level regressions, and tune prompts or hyperparameters with metrics like groundedness, context relevance, coherence, and answer relevance. The result is a faster path from experimentation to production-ready agent behavior. At scale, TruLens is used by thousands of developers and is trusted by teams at Equinix, Snowflake, KBC Group, tribble.ai, CubeServ, and Datec. It works with any AI agent through the Python SDK or by ingesting OpenTelemetry traces, so it fits into an existing observability stack instead of forcing a new one. The project is community-driven open source and shepherded by Snowflake, which gives it a public, inspectable development model for teams that want credible LLM apps without a closed black box.
Why use TruLens?
- OpenTelemetry support lets it fit into existing observability pipelines without a separate tracing format.
- Thousands of developers use it, and named adopters include Equinix, Snowflake, KBC Group, tribble.ai, CubeServ, and Datec.
Who is TruLens for?
- AI engineers who need to compare agent versions and catch trace-level regressions.
- ML teams who want metrics for groundedness, relevance, and other quality signals.
- Platform teams who need observability-friendly tracing for agent workflows.
- Product teams shipping RAG or summarization apps that need faster iteration loops.
What are TruLens's key features?
Evaluate
Run evals on agent outputs through the Python SDK or OpenTelemetry traces, so teams can score behavior against repeatable checks before shipping changes.
Iterate
Compare trace-backed runs and refine prompts or workflows faster, using the same Python SDK data to see what changed and why.
Test
Build test cases for agent behavior and replay them against captured traces, helping catch regressions before they reach production systems.
Interoperable tracing
Ingest OpenTelemetry traces and keep observability data portable across tools, which matters when teams already standardize on OpenTelemetry.
Scalable, trusted evals
Support evaluation workflows at larger scale with trace ingestion and Python SDK access, giving teams a consistent way to trust results across many runs.
What does TruLens integrate with?
- OpenTelemetry
What are TruLens's use cases?
Agent version regression checks
AI engineers use TruLens to compare agent versions and catch trace-level regressions before they ship. They lean on Evaluate to score outputs consistently and Interoperable tracing to inspect where a new prompt or tool call changed behavior.
Grounded RAG quality review
ML teams use TruLens to measure groundedness and relevance in RAG pipelines, then use Test to validate changes against quality signals. Scalable, trusted evals helps them keep review results consistent as datasets and experiments grow.
Observability for agent workflows
Platform teams use TruLens to add observability-friendly tracing to agent workflows and debug failures across services. With Interoperable tracing and Iterate, they can follow a request end to end and shorten the time from incident to fix.
Faster summarization iteration
Product teams shipping summarization apps use TruLens to tighten their iteration loop on output quality. They use Iterate and Evaluate to spot weak summaries quickly, then Test to confirm the next release improves relevance without introducing new errors.
How does TruLens work?
- Download the Python SDK or connect OpenTelemetry traces to start capturing agent activity in TruLens. This gives you the first trace stream to inspect without changing your whole stack.
- Open the documentation to wire up Evaluate and Interoperable tracing for your workflow. Map the spans, prompts, and outputs you want to score so each run is measurable.
- Run Test on a small set of examples to establish a baseline for groundedness, relevance, and other quality signals. Use the results to spot regressions early.
- Iterate on prompts, tools, or retrieval settings, then re-run Evaluate to compare versions side by side. Keep the best-performing configuration as your working release.
- Use the community and ongoing evals to expand coverage as your app grows. Scalable, trusted evals helps teams keep checks reliable across more traces and releases.
Frequently asked questions
What is TruLens?
TruLens is an open-source AI evaluation tool for teams that measures agent quality by tracing execution flow and scoring retrieved context, tool calls, plans, and outputs. Its Evaluate, Iterate, and Test workflow helps teams compare versions and catch regressions with groundedness, context relevance, coherence, and answer relevance. It works through the Python SDK or OpenTelemetry traces and is used by teams at Equinix, Snowflake, and KBC Group.
What is TruLens used for? Who is it for?
TruLens is used for Evaluate, Iterate, and Test. It's built for AI engineers, ML teams, and Platform teams.
Does TruLens have an API and what does it integrate with?
TruLens can be used via the Python SDK or by ingesting OpenTelemetry traces. It integrates with OpenTelemetry.
