Replicate

by Replicate (Cloudflare)

Run and fine-tune open-source AI models with a cloud API, billed per second

Agent PlatformAssistant

Last reviewed 2026-06-20

Replicate is a cloud platform for running machine-learning models through a hosted API, without provisioning GPUs or managing infrastructure. Its catalog lists 50,000+ community-contributed and curated models for image generation, video, audio, speech, and language (including FLUX, Stable Diffusion, Whisper, and hosted LLMs), each with a web playground and a one-line API call from Python, Node, or HTTP. You only pay for the seconds your model runs, and the service scales from zero to many GPUs on demand. Beyond running public models, developers can fine-tune models on custom data and deploy their own models packaged with Cog, Replicate's open-source tool for turning a model into a production container with a generated API server. It is infrastructure and developer tooling, not a finished agent: how anything built on it behaves is the developer's design. Replicate was founded in 2019 by Ben Firshman and Andreas Jansson; Cloudflare announced an agreement to acquire it on November 17, 2025, with Replicate continuing to operate as a distinct brand and its API unchanged.

What it can do

Run 50,000+ open-source models via API
Assistant
Provides one-line API access (Python, Node, HTTP) and a per-model web playground to run a catalog of 50,000+ community and curated models across image, video, audio, and language, without managing GPUs. This is infrastructure that executes a model on request, not an autonomous actor.
source
Auto-scale inference and serverless GPUs
Assistant
Scales from zero to many GPUs based on demand and bills per second of compute (cold boots load model weights and are billed at the same rate); deployments let you keep instances warm with custom hardware and minimum-instance settings.
source
Fine-tune models on custom data
Assistant
Lets developers train and fine-tune models (for example image LoRAs and language models) on their own data through Replicate's training workflows, producing a new model that can be served via the same API.
source
Package and deploy custom models with Cog
Assistant
Cog, Replicate's open-source tool, packages a model into a production container from a cog.yaml environment and a predict.py, generating an API server and deploying it on Replicate's cloud. The developer defines behavior; Replicate handles serving.
source

Strengths

+Huge catalog of open-source models runnable with a single API call, no GPU provisioning
+Transparent per-second (or per-output) usage billing that scales to zero when idle
+Cog lets you package and deploy your own models on the same managed infrastructure

Limitations

−It is inference infrastructure and tooling, not a turnkey agent; you build the application around it
−Cold boots can take tens of seconds to minutes for rarely-used models and are billed at the running rate, so latency and cost can be unpredictable without warm deployments
−Future direction is tied to Cloudflare following the November 2025 acquisition agreement, so roadmap and integration details may shift

Overview

Replicate is a cloud API platform for running machine-learning models without managing GPUs or infrastructure. Its tagline is "Run AI with an API." The catalog lists 50,000+ models contributed by the community plus a curated set of official models, spanning image generation, video, audio, speech, and language. Each model has a web playground and a one-line API call. Replicate is best understood as inference infrastructure and developer tooling, not a finished product and not an autonomous agent.

What it does

Three core jobs. Run models: call any model from Python, Node, or raw HTTP, or try it in the browser playground; each run is a prediction object with inputs, outputs, status, and metadata. Fine-tune: train models (image LoRAs, language models, and others) on your own data through Replicate's training workflows, producing a new model served on the same API. Deploy your own: package a model with Cog, Replicate's open-source tool that builds a production container from a cog.yaml environment and a predict.py, generates an API server, and deploys it on Replicate's cloud. Under the hood the platform auto-scales from zero to many GPUs on demand and bills per second of compute; deployments let you keep warm instances to avoid cold boots. Everything is request/response: the platform executes a model when called, so autonomy is whatever the developer builds around it.

Integrations & setup

Official SDKs cover Python and Node.js, with a language-agnostic HTTP API and webhooks for async results. Custom models are built and pushed with Cog (open source on GitHub at replicate/cog). The docs include guides for Node.js, Python, Google Colab, pushing custom models, CI/CD, webhooks, ComfyUI, and LoRAs. Following the November 2025 Cloudflare acquisition agreement, Replicate's catalog is slated to integrate with Cloudflare's Workers AI, while the existing Replicate API continues to work.

Pricing

Usage-based, no seat fees for core use. Most public models are billed by run time at a per-second hardware rate: roughly $0.000025/sec for a small CPU, $0.000225/sec for an Nvidia T4, $0.000975/sec for an L40S, $0.001400/sec for an A100 80GB, and $0.001525/sec for an H100. Some models are billed per output instead (for example FLUX Pro at $0.04 per image, or hosted LLMs priced per input/output token). Cold boots load model weights and are billed at the same rate. Private and dedicated deployments bill for setup, idle, and active time depending on configuration. Enterprise volume discounts, higher GPU limits, and support are available. (Figures as of June 2026; verify on the pricing page.)

Best for / not for

Best for developers and product teams who want to add AI features (image, video, audio, or text) by calling open-source models over an API, or who want to deploy their own models without building serving infrastructure. Less suited to non-technical buyers wanting a turnkey end-user agent, latency-critical workloads that cannot tolerate cold boots without paying for warm capacity, or teams needing fully predictable flat-rate pricing.

Alternatives

Hugging Face offers a larger open-model hub with its own Inference Endpoints and Spaces; OpenRouter focuses specifically on a unified API across hosted LLMs. Other serverless-GPU and inference vendors compete on price, cold-start latency, and model coverage. Replicate's distinguishing combination is a very broad model catalog plus Cog for shipping your own models on the same per-second-billed infrastructure.

What people are saying

We aggregate real LinkedIn discussion into sentiment for the agents people search most. Replicate isn't tracked yet, want it added? Request tracking.

FAQ

Is Replicate an AI agent?+

No. Replicate is a platform for running, fine-tuning, and deploying machine-learning models through a cloud API. It executes models on request and provides serverless GPU infrastructure; it does not act autonomously on its own. Anything agent-like is built by the developer on top of it.

How does Replicate pricing work?+

Most public models are billed by the time they run, at a per-second rate that depends on the hardware (for example roughly $0.000225/sec on an Nvidia T4 and $0.001525/sec on an H100). Some models are instead billed per output (per image, or per input/output token), such as FLUX Pro at $0.04 per image. Compute scales to zero when idle, and enterprise volume discounts are available. (Rates as of June 2026; check the pricing page.)

What is Cog?+

Cog is Replicate's open-source tool for packaging a machine-learning model into a production-ready container. You define the environment in cog.yaml and the prediction logic in predict.py, and Cog generates an API server you can run locally or deploy on Replicate's cloud.

Did Cloudflare acquire Replicate?+

Cloudflare announced an agreement to acquire Replicate on November 17, 2025, expected to close within about two months, to fold its model catalog into Cloudflare's Workers AI stack. Replicate continues to operate as a distinct brand and states its API does not change and existing models keep working.