Skip to main content
Favicon of Arize Phoenix

Arize Phoenix

What is Arize Phoenix?

Arize Phoenix is an open-source agent tracing and evaluation platform for AI engineers that shows each step an agent takes, then turns traces into datasets, experiments, and scored evaluations. It includes Tracing, Evaluation, Iteration, and Agent Integrations, and supports OpenTelemetry plus LlamaIndex. Teams use it with GitHub, Microsoft, and Atlassian, and can run it locally, in Docker, on Kubernetes, or in the cloud.

Last verifiedHow we evaluate

Screenshot of Arize Phoenix website

At a glance

Best for
Phoenix is best for AI engineers who need trace-level visibility and repeatable evaluation for agents.

What does Arize Phoenix do?

Phoenix handles agent tracing and evaluation by showing each step an agent takes, then turning those traces into datasets, experiments, and scored evaluations. The workflow is built around tracing, evaluation, iteration, and agent integrations, so teams can inspect prompts, retrievals, tool calls, and outputs, then test changes with evidence instead of guesswork. At scale, Phoenix is backed by 9.7k GitHub stars, 2M+ downloads, 2.5M+ monthly downloads, and a 7k+ community. It is open source and self-hostable, with Docker, Kubernetes, local, and cloud deployment options. The site shows customers including GitHub, twilio, shopee, ebay, Moloco, garena, atlassian, microsoft, bytedance, mongodb, rippling, and bill, and it supports LlamaIndex integrations.

Why use Arize Phoenix?

  • Open-source and self-hostable, so teams can keep agent data on their own infrastructure.
  • Trace, annotate, hypothesize, experiment, and measure in one workflow instead of stitching together separate tools.
  • Built around agent development rather than generic observability, which keeps debugging tied to actual model behavior.
  • Backed by a large community and strong GitHub adoption, which helps teams find examples and momentum.
  • Supports Docker, Kubernetes, local, and cloud deployment, giving teams flexibility in how they run it.

Who is Arize Phoenix for?

  • AI engineers who need to debug agent behavior from prompts through tool calls.
  • ML teams who want to score outputs before changes reach users.
  • Product teams building LLM features that need evidence-based iteration.
  • Platform teams that prefer self-hosted observability for internal AI workflows.

What are Arize Phoenix's key features?

Tracing

Capture LLM traces with OpenTelemetry instrumentation and 2.4M+ monthly downloads, so teams can inspect prompts, tool calls, and latency in one place.

Evaluation

Run evaluations on traces and datasets to compare outputs against expected behavior, helping teams catch regressions before shipping changes.

Iteration

Use the OBSERVE, ANNOTATE, HYPOTHESIZE, EXPERIMENT, and MEASURE workflow to refine prompts and agents with a repeatable review loop.

Agent Integrations

Connect LlamaIndex to Phoenix for agent tracing and evaluation, making it easier to debug retrieval and tool-use behavior in existing stacks.

Privacy

Keep sensitive data under control with self-hosting options and local deployment, which matters for teams handling private model inputs and outputs.

OSS Community

Join a 7k+ community around an open-source project with 9.7k GitHub stars and 2M+ downloads, useful for shared examples and support.

Built on Standards

Instrument systems with OpenTelemetry and other standard tracing formats, reducing lock-in and making telemetry easier to move across tools.

Vendor Agnostic

Use the same tracing and evaluation workflow across models and frameworks without tying telemetry to one provider, which helps avoid migration pain.

What does Arize Phoenix integrate with?

  • LlamaIndex

What are Arize Phoenix's use cases?

Agent debugging for AI engineers

AI engineers use Arize Phoenix to trace an agent from prompt to tool call, using Tracing to pinpoint where behavior drifts and Agent Integrations to capture the full workflow. They can then compare runs in Evaluation and Iteration to fix failures before they reach users.

Pre-release scoring for ML teams

ML teams use Arize Phoenix to score model outputs before shipping changes, using Evaluation to catch regressions and MEASURE to track whether a prompt or retrieval tweak actually improves results. That helps them block weak releases and keep quality consistent.

Evidence-based iteration for product teams

Product teams building LLM features use Arize Phoenix to turn user feedback into experiments, using ANNOTATE and HYPOTHESIZE to label problem cases and test new prompts or workflows. They rely on EXPERIMENT and Iteration to prove which change improves the experience.

Self-hosted observability for platform teams

Platform teams use Arize Phoenix to monitor internal AI workflows with Privacy and self-hosted deployment options like Local, Docker, Kubernetes, or Cloud. That gives them observability without giving up control over sensitive traces and evaluation data.

How does Arize Phoenix work?

  1. Connect your first LLM workflow with Tracing or an Agent Integrations setup so Phoenix can capture prompts, tool calls, and outputs as runs flow through the system.
  2. Review traces in the UI, then use ANNOTATE to mark failures, edge cases, and good examples so your team has concrete evidence instead of guesswork.
  3. Set up Evaluation to score outputs against your criteria, and use MEASURE to compare baseline behavior with new prompt or retrieval changes.
  4. Move into HYPOTHESIZE and EXPERIMENT to test improvements, then use Iteration to keep the best-performing version moving toward production.
  5. Deploy with Privacy and your preferred environment, whether Local, Docker, Kubernetes, or Cloud, so observability fits your security and infrastructure requirements.

Frequently asked questions

What is Arize Phoenix?

Arize Phoenix is an open-source agent tracing and evaluation platform for AI engineers that shows each step an agent takes, then turns traces into datasets, experiments, and scored evaluations. It includes Tracing, Evaluation, Iteration, and Agent Integrations, and supports OpenTelemetry plus LlamaIndex. Teams use it with GitHub, Microsoft, and Atlassian, and can run it locally, in Docker, on Kubernetes, or in the cloud.

What is Arize Phoenix used for? Who is it for?

Arize Phoenix is used for Tracing, Evaluation, and Iteration. It's built for AI engineers, ML teams, and Product teams building LLM features that need evidence-based iteration.

Does Arize Phoenix have an API and what does it integrate with?

Arize Phoenix doesn't publish a public API. It integrates with LlamaIndex.

Editor's read

Check whether your team needs self-hosting or local deployment before adopting Phoenix. The listing shows those options, so confirm your infrastructure can support them and that your workflow benefits from tracing, evaluation, and iteration on private agent data.

Every listing on AgentsIndex passes the same public editorial bar. Listings are built from a structured read of the vendor's own pages rather than first-hand product trials. Pricing and features are checked against the live site at the date of last verification.

Verified against phoenix.arize.com on . Spotted something out of date? Tell us.

Found something inaccurate? Report an inaccuracy.

Disclosure: AgentsIndex earns revenue from premium listings and may earn a commission when you sign up for tools via our outbound links. This does not affect inclusion, ranking, or editorial judgment.
Source policy: Listings are built from first-party vendor pages by default; third-party references are used only when they add verifiable context not available on the vendor site.
Sources consulted:
  1. phoenix.arize.com

Share:

Sponsored
Favicon

 

  
 

Explore other Agent Tools & Integrations

Favicon

 

  
  
Favicon

 

  
  
Favicon