Promptfoo

Promptfoo is an open-source CLI tool for LLM evaluation, red teaming, and AI security testing. Used by 127 Fortune 500 companies. Free tier available.

Reviewed by Mathijs Bronsdijk · Updated Apr 13, 2026

ToolFreeUpdated 1 month ago

Visit Promptfoo

What is Promptfoo?

Promptfoo is an open-source CLI tool and library for evaluating, testing, and red-teaming LLM applications before they reach production. Built on a test-driven development philosophy, it helps developers move away from trial-and-error prompt engineering toward systematic benchmarking with defined test cases and assertions. The tool runs entirely locally for privacy and integrates with CI/CD pipelines and fits both individual developers and enterprise security teams. It is used by 127 of the Fortune 500 and powers production LLM applications serving over 10 million users. Promptfoo is now part of OpenAI.

Key Features

Red Teaming: Automated adversarial testing that scans for prompt injections, jailbreaks, PII leaks, toxic outputs, and business rule violations, drawing on threat intelligence from a community of over 300,000 users.
Guardrails: Real-time protection against jailbreaks and adversarial attacks during inference.
Evaluations: Benchmark prompts, models, and RAG pipelines using custom metrics such as accuracy and safety, with support for assertions like contains and llm-rubric.
Code Scanning: Detects LLM vulnerabilities directly in your IDE and CI/CD pipeline before deployment.
Model Security: Security testing and monitoring across AI models, with CVSS-based risk scoring adapted for LLM vulnerabilities.
MCP Proxy: A secure proxy for Model Context Protocol communications, with an MCP server that exposes evaluations as tools for AI agents such as Cursor and Claude Desktop.
Multi-Provider Support: Integrates with OpenAI, Anthropic, Google (Vertex, AI Studio), HuggingFace, Llama, and custom API providers, including support for tool use, function calling, temperature, and token settings.
Performance Features: Includes caching, concurrency controls, live reloading with --watch, retry handling, and the ability to resume interrupted evaluations.

Use Cases

LLM Developers refining prompts: Developers define test cases, run evaluations across multiple providers, and compare results in a matrix view or web UI to identify the most reliable prompt configuration.
QA Engineers running systematic model tests: QA teams use batch assertions and structured output formats (HTML, CSV, JSONL, YAML) to document model behavior and catch regressions across updates.
Security Researchers pentesting AI applications: Security teams run automated red teaming scans against deployed models to surface injection vulnerabilities, unsafe outputs, and insecure agent tool use before those issues reach users.
Enterprise teams meeting compliance requirements: Organizations in financial services, insurance, telecommunications, and real estate use industry-specific configurations, such as FINRA-aligned testing and fair housing compliance checks, to meet regulatory requirements.
Development teams integrating AI testing into CI/CD: Teams embed Promptfoo into their build pipelines so that every code change triggers automated LLM evaluations, keeping quality gates consistent across releases.

Strengths and Weaknesses

Strengths:

Free and open-source under the MIT license, with no cost barrier to start using core evaluation and red teaming features.
Supports concurrent testing and caching, which speeds up evaluation runs significantly compared to sequential testing approaches.
Works with a wide range of LLM providers out of the box, including OpenAI, Anthropic, Google, HuggingFace, and self-hosted models like Llama.
Runs entirely locally, keeping evaluation data off third-party servers when privacy is a requirement.
The team has been noted for responsiveness and smooth onboarding, with support that continues after initial setup.

Weaknesses:

Known bugs have been reported around assertion result handling, including "assertion result mock pollution" that can produce unreliable evaluation results in some test configurations.
UI errors can occur on the evaluation page when derived metrics return null scores, causing display crashes during review.
With 320 open GitHub issues, ongoing stability improvements are still active, which may affect teams relying on newer or less-common configurations.

Pricing

Community: Free forever, includes all LLM evaluation features, all model providers and integrations, red teaming with 10,000 probes per month, custom app integration, local or self-hosted deployment, vulnerability scanning, and community support.
Enterprise: Custom pricing, includes everything in Community plus custom red teaming limits, team sharing and collaboration, continuous monitoring, a centralized security and compliance dashboard, customizable attack profiles, SSO and granular permission profiles, Promptfoo API access, managed cloud deployment, professional services support, and priority support with SLA guarantees.
On-Premise: Custom pricing, includes all Enterprise features plus deployment on your own infrastructure, complete data isolation, a dedicated runner, and an assigned deployment engineer.

Contact Promptfoo directly for Enterprise and On-Premise quotes.

FAQ

What is Promptfoo used for?

Promptfoo is an open-source CLI tool and library for evaluating, testing, and red-teaming LLM applications before they reach production. It helps developers benchmark prompts, scan for vulnerabilities, and run automated adversarial tests against AI models. It also integrates with CI/CD pipelines for continuous AI safety testing.

Is Promptfoo free?

Promptfoo is open-source, meaning the core tool is freely available. The tool runs locally and is accessible to individual developers as well as enterprise security teams.

Who is the CEO of Promptfoo?

The provided information does not include details about Promptfoo's CEO.

Who owns Promptfoo?

Promptfoo is now part of OpenAI.

Why is "foo" commonly used in software tools?

"Foo" is a long-standing placeholder term in programming, originating from developer culture and used widely in examples and tool names. Promptfoo uses it as part of its name to reflect its roots in developer tooling.

Is Promptfoo good?

Promptfoo is used by 127 of the Fortune 500 and powers production LLM applications serving over 10 million users. Its red teaming capability draws on threat intelligence from a community of over 300,000 users.

What kinds of vulnerabilities does Promptfoo detect?

Promptfoo scans for prompt injections, jailbreaks, PII leaks, toxic outputs, and business rule violations. It also detects LLM vulnerabilities directly in your IDE and CI/CD pipeline before deployment.

Does Promptfoo work with multiple AI providers?

Yes, Promptfoo integrates with OpenAI, Anthropic, Google (Vertex, AI Studio), HuggingFace, Llama, and custom API providers. It supports tool use, function calling, temperature, and token settings across providers.

Does Promptfoo run in the cloud or locally?

Promptfoo runs entirely locally, which is designed to protect privacy during testing and evaluation.

Can Promptfoo be used in CI/CD pipelines?

Yes, Promptfoo integrates with CI/CD pipelines so development teams to embed AI testing directly into their build and deployment workflows.

Who uses Promptfoo?

Promptfoo is used by individual developers, QA engineers, security researchers, and enterprise teams in industries such as financial services, insurance, telecommunications, and real estate.

Does Promptfoo support compliance testing?

Yes, Promptfoo offers industry-specific configurations for compliance requirements, including FINRA-aligned testing and fair housing compliance checks.

What is Promptfoo's MCP Proxy?

The MCP Proxy is a secure proxy for Model Context Protocol communications. It includes an MCP server that exposes evaluations as tools for AI agents such as Cursor and Claude Desktop.

Is ChatGPT no longer free?

This question is unrelated to Promptfoo and is not covered by the available information.

Categories:

Testing & Evaluation

Tags:

api ci-cd free multi-provider-support open-source

Similar to Promptfoo

Browse Testing & Evaluation

RAGAS

Open-source RAG evaluation framework for LLM applications

Testing & Evaluation

RAGAS is a free, open-source Python library for evaluating RAG pipelines and AI agents with automated metrics and synthetic test data.

TruLens

Open-source evals and tracing for AI agents and LLM apps

Testing & Evaluation

TruLens is a free, open-source tool for evaluating and tracing AI agents. Measure groundedness, context relevance, and more with Python.

Galileo AI Evaluate

Evaluate and monitor LLM apps in production with observability

Testing & Evaluation

Galileo AI Evaluate helps teams assess, debug, and monitor LLM apps, chatbots, RAG, and copilots in production.

DeepEval

The open-source LLM evaluation framework

Testing & Evaluation

Open-source LLM evaluation framework with 50+ research-backed metrics for testing hallucination, relevancy, faithfulness, and more. Pytest-style testing for CI/CD pipelines.

SWE-bench

SWE-bench: the standard benchmark for evaluating AI on real software engineering tasks

Testing & Evaluation

SWE-bench tests AI agents on real GitHub issues. Includes Verified, Lite, Multilingual, and Multimodal variants with public leaderboards.