Replicate

What is Replicate?

Replicate is an AI model inference and fine-tuning platform for software teams that exposes models through a simple API for generating text, images, speech, music, video, and more. It includes the Playground, model comparison, and docs for Node, Python, and HTTP workflows. The catalog spans over 50,000 AI models and is used by Alibaba, OpenAI, Anthropic, and Google. Pricing is usage-based, starting at CPU (Small) $0.000025/sec.

Last verifiedMay 17, 2026How we evaluate

Visit Replicate

At a glance

Best for: Replicate is best for software teams who want to ship AI features through an API without managing model infrastructure.
Pricing: CPU (Small) $0.000025/sec; CPU $0.000100/sec; Nvidia A100 (80GB) GPU $0.001400/sec; 2x Nvidia A100 (80GB) GPU $0.002800/sec…

What does Replicate do?

Replicate handles model inference and fine-tuning by exposing AI models through a simple API, so you can send a prompt and get back generated media or text. The platform's Playground helps you compare models and iterate quickly, while the docs and code examples show Node, Python, and HTTP workflows for calling models from your app. On the product surface, it spans image generation, speech, music, image restoration, video generation, captioning, and large language models, so teams can move from experimentation to production without changing tools. At scale, Replicate hosts over 50,000 AI models and says thousands of open-source models are contributed by the community. The site shows models with 154.9M runs, 91.6M runs, 42M runs, and 24.6M runs, which signals real usage across the catalog. Enterprise pages show deployment at scale, logging and monitoring, and deploy custom models with Cog. Customers and model creators shown on the site include Alibaba, OpenAI, Anthropic, Google, ByteDance, xAI, Black Forest Labs, Recraft, ElevenLabs, and IBM Granite.

Why use Replicate?

It removes most of the infrastructure burden of model deployment, including API servers, dependencies, batching, and GPU setup.
The catalog spans public and proprietary models, so teams can test many approaches without switching platforms.
Cog lets teams package and deploy custom models instead of being limited to hosted public models.
The Playground supports rapid comparison and iteration, which shortens the path from idea to working prototype.
Enterprise positioning around over 50,000 AI models suggests the platform can support broad experimentation and production use.

Who is Replicate for?

Product engineers who need to add AI generation to an app with minimal backend work.
ML developers who want to run, compare, and refine models before production.
Startup teams who need fast experimentation across image, video, speech, and LLM workflows.
Platform teams who want monitoring and scalable model deployment for custom AI services.
Builders who prefer calling models through Node, Python, or HTTP instead of managing GPUs.

What are Replicate's key features?

Generate images

Create images through Replicate's API using models from OpenAI, Black Forest Labs, and Recraft, with outputs spanning ~2048px resolution.

Generate speech

Turn text into speech with models from ElevenLabs and OpenAI, including support for 30 voices and 70+ language support.

Generate music

Produce music with API-accessible models, letting teams automate audio generation in Node, Python, or HTTP workflows.

Restore images

Repair and enhance damaged photos with image restoration models, then route results into Slack or Notion for review.

Generate videos

Create videos from images or text with models like PixVerse, including 720p and 1080p output, 3-15 second durations, and five aspect ratios.

Large Language Models

Run LLMs from OpenAI, Anthropic, Google, xAI, and IBM Granite through one API, including models with a 262k context window and 2k output.

Compare models

Test multiple models side by side to choose the best fit, using Replicate's catalog of over 50,000 AI models and community-contributed options.

Rapidly prototype

Build and ship AI features quickly with Node, Python, HTTP, and Zapier integrations, then connect outputs to Stripe, HubSpot, or Shopify.

What does Replicate integrate with?

Node
Python
HTTP
Slack
Salesforce
Google Calendar
Stripe
HubSpot
Klaviyo
Shopify
Notion
Zapier
QuickBooks
Calendly

What are Replicate's use cases?

Product engineers ship AI features

Product engineers who need to add AI generation to an app with minimal backend work use Replicate to call models through Node, Python, or HTTP and ship features faster. They can start with Rapidly prototype, then move to Large Language Models (LLMs) or Generate images without managing GPUs.

ML developers compare model outputs

ML developers who want to run, compare, and refine models before production use Replicate to test candidates side by side with Compare models. They then use Tweak and refine to iterate on prompts or settings until the output quality is ready for a broader rollout.

Startup teams test multimodal workflows

Startup teams who need fast experimentation across image, video, speech, and LLM workflows use Replicate to prototype new product ideas quickly. They can mix Generate videos, Generate speech, and Large Language Models (LLMs) in one workflow, helping them validate concepts before committing engineering time.

Platform teams deploy custom AI

Platform teams who want monitoring and scalable model deployment for custom AI services use Replicate to operationalize models without building GPU infrastructure from scratch. They rely on Popular models and Official models to standardize what gets deployed, while Compare models helps them choose the best-performing option.

How does Replicate work?

Connect your first model through Node, Python, or HTTP, then open the model page to choose a hosted option from Popular models or Official models.
Run a first prompt with Rapidly prototype to verify inputs, outputs, and latency before wiring the model into your app or internal workflow.
Use Compare models to test alternatives side by side, then apply Tweak and refine to adjust prompts, parameters, or model choice until results match your target.
Expand into Generate images, Generate speech, Generate videos, or Large Language Models (LLMs) as your product needs grow, keeping the same API-based workflow.
Monitor usage and iterate on the model you ship, using the same deployment path to support ongoing experimentation and production updates.

How much does Replicate cost?

CPU (Small)

$0.000025/sec

CPU (Small)
$0.000025/sec
$0.09/hr
GPU-CPU1xGPU RAM-RAM2GB

CPU

$0.000100/sec

CPU
$0.000100/sec
$0.36/hr
GPU-CPU4xGPU RAM-RAM8GB

Nvidia A100 (80GB) GPU

$0.001400/sec

Nvidia A100 (80GB) GPU
$0.001400/sec
$5.04/hr
GPU1xCPU10xGPU RAM80GBRAM144GB

2x Nvidia A100 (80GB) GPU

$0.002800/sec

2x Nvidia A100 (80GB) GPU
$0.002800/sec
$10.08/hr
GPU2xCPU20xGPU RAM160GBRAM288GB

Nvidia H100 GPU

$0.001525/sec

Nvidia H100 GPU
$0.001525/sec
$5.49/hr
GPU1xCPU13xGPU RAM80GBRAM72GB

Nvidia L40S GPU

$0.000975/sec

Nvidia L40S GPU
$0.000975/sec
$3.51/hr
GPU1xCPU10xGPU RAM48GBRAM65GB

2x Nvidia L40S GPU

$0.001950/sec

2x Nvidia L40S GPU
$0.001950/sec
$7.02/hr
GPU2xCPU20xGPU RAM96GBRAM144GB

Nvidia T4 GPU

$0.000225/sec

Nvidia T4 GPU
$0.000225/sec
$0.81/hr
GPU1xCPU4xGPU RAM16GBRAM16GB

4x Nvidia A100 (80GB) GPU

$0.005600/sec

4x Nvidia A100 (80GB) GPU
$0.005600/sec
$20.16/hr
Additional Multi-GPU A100 capacity is available with committed spend contracts.

8x Nvidia A100 (80GB) GPU

$0.011200/sec

8x Nvidia A100 (80GB) GPU
$0.011200/sec
$40.32/hr
Additional Multi-GPU A100 capacity is available with committed spend contracts.

2x Nvidia H100 GPU

$0.003050/sec

2x Nvidia H100 GPU
$0.003050/sec
$10.98/hr
Additional Multi-GPU H100 capacity is available with committed spend contracts.

4x Nvidia H100 GPU

$0.006100/sec

4x Nvidia H100 GPU
$0.006100/sec
$21.96/hr
Additional Multi-GPU H100 capacity is available with committed spend contracts.

8x Nvidia H100 GPU

$0.012200/sec

8x Nvidia H100 GPU
$0.012200/sec
$43.92/hr
Additional Multi-GPU H100 capacity is available with committed spend contracts.

4x Nvidia L40S GPU

$0.003900/sec

4x Nvidia L40S GPU
$0.003900/sec
$14.04/hr
Additional Multi-GPU L40S capacity is available with committed spend contracts.

8x Nvidia L40S GPU

$0.007800/sec

8x Nvidia L40S GPU
$0.007800/sec
$28.08/hr
Additional Multi-GPU L40S capacity is available with committed spend contracts.

Frequently asked questions

What is Replicate?

How much does Replicate cost? Is it free?

Replicate has 15 paid plans: CPU (Small) at $0.000025/sec, CPU at $0.000100/sec, Nvidia A100 (80GB) GPU at $0.001400/sec.

What is Replicate used for? Who is it for?

Replicate is used for Generate images, Generate speech, and Generate music. It's built for Product engineers, ML developers, and Startup teams.

Does Replicate have an API and what does it integrate with?

Replicate doesn't publish a public API. It integrates with Node, Python, HTTP, Slack, Salesforce, and 9 more.

Editor's read

Check whether your workload will stay within the usage-based GPU and CPU rates as model traffic grows. If you expect sustained inference or multi-GPU runs, confirm the committed-spend contract terms for larger A100, H100, or L40S configurations before committing.

Filed under:AI Model Providers open-source

Explore other AI Model Providers

Browse AI Model Providers

Hugging Face

A shared hub for models, datasets, and AI apps.

AI Model Providers

Hugging Face is an AI hub for models, datasets, and apps, with Spaces, Inference Providers, and Inference Endpoints. PRO starts at $9/month.

Groq

AI inference platform with OpenAI-compatible APIs and prompt caching.

AI Model Providers

Groq serves low-latency AI inference with OpenAI-compatible APIs, prompt caching, and usage tiers starting at $0.05.

Gemma

Open models for generation, retrieval, and safety checks.

AI Model Providers

Gemma offers open models for text, audio, image, retrieval, and safety checks, with deployment across devices and cloud.

Fireworks AI

Open-model inference, tuning, and deployment in one cloud stack.

AI Model Providers

Fireworks AI runs open-model inference, tuning, and deployment with Code Assistance, Search, and Agentic Systems. Starts with $1 in free credits.

DeepSeek

AI model platform for chat, mobile, and API workflows.

AI Model Providers

DeepSeek offers chat, mobile app, and API access to latest models, with free chat and docs for developers.