Zep
What is Zep?
Zep is a context infrastructure platform for AI teams that assembles memory, business data, and user behavior into reusable context blocks for agents. It includes Agent Memory, Graph RAG, Intelligent Context Assembly, and Custom Entity and Edge Filtering, and integrates with LangChain, LangGraph, OpenAI, Anthropic, Azure OpenAI, and Google Gemini. Plans run Flex $125/month, Flex Plus $375/month, and Enterprise custom.
Last verifiedHow we evaluate
At a glance
- Zep is best for AI teams who need persistent, low-latency context for agents.
- Flex $125/mo; Flex Plus $375/mo; Enterprise Custom
- Yes — The page advertises a Graph Search API and context template creation via the client, with API documentation linked.
What does Zep do?
Zep assembles optimized context from chat history, business data, and user behavior so agents can respond with the right information at the right time. Its context assembly flow pulls from memory and business records, then formats the result into reusable context blocks with Dynamic Relevance Ranking, Memory Context Blocks, and Context Templates. For teams that need more control, the Graph Search API and Custom Entity and Edge Filtering let you shape exactly what gets surfaced without hand-crafting prompts. At scale, Zep is built for low-latency retrieval and changing data: the site shows under-200ms retrieval, <200ms P95 retrieval latency, and real-time incremental updates instead of batch recomputation. It supports a unified context graph for evolving facts, and the open-source Graphiti layer reports 100%+ accuracy improvements, 90% latency reduction, and 98% fewer tokens required for processing. Customers shown on the site include Twin Health, Praktika.ai, Writer, Samsung, and HoneyBook.
Why use Zep?
- It replaces manual prompt crafting with automated context assembly, reducing the work needed to keep agent inputs relevant.
- Its temporal graph approach keeps historical context and current facts together, which helps agents handle changing user and business states.
- Low-latency retrieval under 200ms supports voice and other latency-sensitive experiences.
- Enterprise deployment options include managed, BYOK, BYOM, and BYOC, giving teams control over security and infrastructure.
- The platform supports real-time incremental updates, so changing records do not require batch recomputation.
Who is Zep for?
- AI product teams who need agents to remember users across sessions and interactions.
- Developers building context-aware workflows who want graph-based retrieval instead of manual prompt assembly.
- Engineering leaders who need low-latency, production-ready context infrastructure for dynamic business data.
- Teams handling sensitive data who need flexible enterprise deployment and compliance controls.
What are Zep's key features?
Agent Memory
Stores long-term memory from chat history and user behavior, then retrieves it in under 200ms P95 so agents can keep context across sessions.
Graph RAG
Uses a unified context graph and Graph Search API to retrieve related facts and relationships, improving answer quality with millisecond query responses.
Intelligent Context Assembly
Assembles context from memory, business data, and user behavior with dynamic relevance ranking, reducing token use by up to 98%.
Knowledge Graph MCP
Connects context graph data through a Knowledge Graph MCP workflow, with context templates created via the client for structured retrieval.
Managed Enterprise
Supports SOC 2 Type II, HIPAA BAA, audit logs, and 30+ day API logs, giving teams the controls needed for regulated deployments.
Bring Your Own Key
Lets teams run with AWS KMS and CloudTrail while keeping deployment options flexible, including managed, BYOK, BYOM, or BYOC.
Framework Integration
Works with LangChain, LangGraph, OpenAI, Anthropic, Azure OpenAI, and Google Gemini so teams can plug memory into existing agent stacks.
Custom Entity and Edge Filtering
Lets teams define custom entity types and edge types, then filter graph retrieval for cleaner results across complex business relationships.
What does Zep integrate with?
- LangChain
- LangGraph
- OpenAI
- Anthropic
- AWS KMS
- CloudTrail
- Azure
- Claude
- Cursor
- Neo4j
- Azure OpenAI
- Google Gemini
What are Zep's use cases?
Agent memory for product teams
AI product teams who need agents to remember users across sessions use Zep to keep prior preferences, goals, and decisions available in later conversations. They rely on Agent Memory and Persistent Context to make follow-up answers feel continuous, reducing repeated questions and improving task completion.
Graph retrieval for developers
Developers building context-aware workflows use Zep to replace manual prompt assembly with Graph RAG and Intelligent Context Assembly. They can pull the most relevant relationships and facts into each request, which helps agents answer with better grounding and fewer missed dependencies.
Production context for engineering leaders
Engineering leaders use Zep to power low-latency context infrastructure for dynamic business data, combining Dynamic Relevance Ranking with Lightning Fast Retrieval. That lets production agents respond quickly while keeping context fresh as records, events, and relationships change.
Enterprise controls for sensitive data
Teams handling sensitive data use Zep to deploy context infrastructure with Managed Enterprise and Bring Your Own Key (BYOK). They can keep governance tighter while still supporting retrieval and memory for regulated workflows and internal assistants.
How does Zep work?
- Connect your first data source or chat stream, then define what Zep should remember using Agent Memory and Context Templates. Start with the conversations, records, or events that matter most.
- Map entities and relationships with Custom Entity Types and Custom Entity and Edge Filtering, so Zep can build a useful graph instead of storing raw text alone.
- Let Intelligent Context Assembly and Dynamic Relevance Ranking select the right facts for each request, reducing manual prompt assembly and keeping token usage focused.
- Use the Graph Search API or Framework Integration with LangChain, LangGraph, or OpenAI to wire context into your app's runtime and agent workflows.
- Monitor retrieval quality, latency, and updates in the API logs and analytics, then refine templates, filters, and data sources as your product evolves.
How much does Zep cost?
Flex
$125/month- 50,000 Credits per month
- Auto-topup at 20%. 30-day rollover.
- 600 requests per minute
- 5 Projects
- 10 custom entity & edge types
- API logs (1 day)
- Unlimited memories, retrieval & users
Flex Plus
$375/month- 200,000 Credits per month
- Auto-topup at 20%. 60-day rollover.
- 1,000 requests per minute
- 10 Projects
- 20 custom entity & edge types
- Custom extraction instructions
- Webhooks
- Analytics
- Observations(coming soon)
- API logs (7 days)
- Unlimited memories, retrieval & users
Enterprise
Custom- Custom credits with negotiated rates
- Guaranteed rate limits with SLA
- Unlimited projects and entity & edge types
- SOC 2 Type II & HIPAA BAA
- Audit logs & 30+ day API logs
- Teams and Slack support & dedicated account manager
- Managed, BYOK, BYOM, or BYOC deployment
Frequently asked questions
What is Zep?
Zep is a context infrastructure platform for AI teams that assembles memory, business data, and user behavior into reusable context blocks for agents. It includes Agent Memory, Graph RAG, Intelligent Context Assembly, and Custom Entity and Edge Filtering, and integrates with LangChain, LangGraph, OpenAI, Anthropic, Azure OpenAI, and Google Gemini. Plans run Flex $125/month, Flex Plus $375/month, and Enterprise custom.
How much does Zep cost? Is it free?
Zep has 3 paid plans: Flex at $125/month, Flex Plus at $375/month, Enterprise at Custom.
What is Zep used for? Who is it for?
Zep is used for Agent Memory, Graph RAG, and Intelligent Context Assembly. It's built for AI product teams, Developers building context-aware workflows, and Engineering leaders.
Does Zep have an API and what does it integrate with?
The page advertises a Graph Search API and context template creation via the client, with API documentation linked. It integrates with LangChain, LangGraph, OpenAI, Anthropic, AWS KMS, and 8 more.
Editor's read
Check the request-rate and log-retention limits on the tier you plan to buy. Flex includes 600 requests per minute and 1-day API logs, while Flex Plus raises that to 1,000 requests per minute and 7-day logs; Enterprise is the only tier with SLA-backed limits and 30+ day API logs.
