SYSTEM Cited by 1 source
Replicate Cog¶
Definition¶
Cog (cog.run, github.com/replicate/cog) is Replicate's open-source ML-model containerisation format. A model author writes two files:
cog.yaml— declarative build spec: Python version, OS + CUDA dependencies, Pythonrequirements.txt, and the entry-pointpredict.py:Predictor.predict.py— a Python class subclassingcog.BasePredictorwith asetup()method (loads weights into memory once, at container warm-up) and apredict(...)method (runs a single prediction, typed inputs viacog.Input, typed outputs via Python/return annotation).
Running cog build produces a Docker image containing the model,
weights, and inference code — ready to serve via HTTP against the
Cog-defined predict signature.
From the 2026-04-16 Cloudflare post's minimal example:
# cog.yaml
build:
python_version: "3.13"
python_requirements: requirements.txt
predict: "predict.py:Predictor"
# predict.py
from cog import BasePredictor, Path, Input
import torch
class Predictor(BasePredictor):
def setup(self):
"""Load the model into memory to make running multiple predictions efficient"""
self.net = torch.load("weights.pth")
def predict(self,
image: Path = Input(description="Image to enlarge"),
scale: float = Input(description="Factor to scale image by", default=1.5)
) -> Path:
"""Run a single prediction on the model"""
# ... pre-processing ...
output = self.net(input)
# ... post-processing ...
return output
Why it matters — Cog as the Workers AI BYO-model substrate¶
The 2026-04-16 Cloudflare post announces Cog as the packaging format for bringing a custom or fine-tuned model to Workers AI. Post framing:
- "Cog is designed to be quite simple: all you need to do is write down dependencies in a cog.yaml file, and your inference code in a Python file. Cog abstracts away all the hard things about packaging ML models, such as CUDA dependencies, Python versions, weight loading, etc."
- "Then, you can run cog build to build your container image, and push your Cog container to Workers AI. We will deploy and serve the model for you, which you then access through your usual Workers AI APIs."
The workflow Cloudflare is productising:
- Customer writes
cog.yaml+predict.py. cog buildproduces a container.- Push the container to Workers AI ("customer-facing APIs and wrangler commands" are on the roadmap, not shipped yet).
- Cloudflare deploys and serves on its GPU fleet.
- The model appears on the customer's
AI Gateway catalog alongside
@cf/…models and third-party providers, callable through the sameenv.AI.run(...)binding.
Currently Enterprise + design-partner access only: "The overwhelming majority of our traffic comes from dedicated instances for Enterprise customers who are running custom models on our platform." Post: "We've been testing this internally with Cloudflare teams and some external customers who are guiding our vision. If you're interested in being a design partner with us, please reach out! Soon, anyone will be able to package their model and use it through Workers AI."
Strategic context — Replicate is now the AI Platform team¶
The 2026-04-16 post notes: "The Replicate team has officially joined our AI Platform team, so much so that we don't even consider ourselves separate teams anymore. We've been hard at work on integrations between Replicate and Cloudflare, which include bringing all the Replicate models onto AI Gateway and replatforming the hosted models onto Cloudflare infrastructure. Soon, you'll be able to access the models you loved on Replicate through AI Gateway, and host the models you deployed on Replicate on Workers AI as well."
Cog is not a new technology — it has been Replicate's open- source hosting format for years — but this post repositions it as the Workers AI BYO-model substrate, which explains the catalog expansion from text-LLM-dominated to multimodal (image, video, speech) in the same announcement.
Roadmap¶
- Customer-facing APIs to push Cog containers (not shipped).
wranglerCLI commands to push from a dev workstation (not shipped).- Faster cold starts via GPU snapshotting — the post names GPU snapshotting as a work-in-progress technique to make warm-from-weights-already-in-GPU faster than the load-weights-then-predict path a freshly-scheduled Cog container would otherwise do. (Not shipped; no cold-start latency numbers disclosed.)
Seen in¶
- sources/2026-04-16-cloudflare-ai-platform-an-inference-layer-designed-for-agents —
canonical wiki introduction. Cog as the BYO-model format for
Workers AI;
cog.yaml+predict.pyworked example; Enterprise + design-partner access today with public rollout roadmap (APIs + wrangler commands + GPU-snapshot cold starts).
Related¶
- systems/workers-ai — the platform Cog containers are pushed to and served on.
- systems/cloudflare-ai-gateway — the unified catalog surface through which Cog-packaged models appear alongside first-party + third-party models.
- patterns/byo-model-via-container — the pattern Cog productises: customer packages the model + inference code as a container, platform handles the rest.
- concepts/container-ephemerality — Cog containers sit inside the broader ephemeral-compute substrate of Workers AI.
- companies/cloudflare — post-Replicate-acquisition operator.