SYSTEM Cited by 1 source
Skipper (Cloudflare AI data agent)¶
Skipper is Cloudflare's AI data agent that runs on top of Town Lake — a conversational interface that turns natural-language questions into validated SQL, executed against the lakehouse, with results returned as tables, charts, or shareable dashboards. Introduced publicly alongside Town Lake in the 2026-05-28 launch post.
"Skipper is our take on a conversational AI agent that goes from natural-language question to validated answer, grounded in the company's actual data, code, and institutional knowledge."
Built entirely on Cloudflare's customer Developer Platform — Workers, Workers AI, Durable Objects, D1, R2, Workflows, KV.
Interaction shape¶
The interface is a chat box. Example:
"Show me the top 10 customers by R2 storage cost in the last 30 days, and the change versus the previous 30 days."
Skipper's loop:
- Find the right tables (DataHub search).
- Pull schemas + lineage.
- Write the SQL.
- Submit to Trino.
- Poll for results.
- Render a table or chart.
- Carry context across follow-ups: "now break it down by region, and ignore internal Cloudflare accounts" refines and reruns the query.
- Closed-loop reasoning: "if something looks wrong, e.g., a join produced zero rows or a filter excluded what you expected, then Skipper investigates, adjusts, and tries again."
Skipper can also package charts into dashboards that get embedded into other internal tools, drive transformation graphs via Transformer, and check access and permissions via Lifeguard.
Layered grounded context (the load-bearing architecture)¶
The post explicitly names "the hard part" of building a data agent: "An LLM, given a SQL prompt and a list of table names, can hallucinate joins, misuse columns, and confidently produce a number that is completely wrong. We learned this the hard way during early experiments. The fix is multiple layers of grounded context that the model can pull from at retrieval time."
Skipper's five layers, canonicalised at concepts/layered-grounded-context-for-data-agent:
- Schema + usage metadata — DataHub knows every column,
type, primary key, foreign key for every table; also which
tables are commonly joined together based on historical query
patterns. Skipper's
search_datasetsandget_entity_detailstools surface this directly. - Human annotations — table descriptions like "Account-
level entity. One row per account_id. Every account belongs
to exactly one customer (via customer_id FK)" live in
DataHub. Tags like
curatedmark validated tables that Skipper should prefer over scratch space. - Code-derived knowledge — "the most valuable context is
not in any catalog: it's in the SQL that produces the table.
The Transformer pipeline emits per-node
.meta.jsondocumentation to DataHub on every successful run. So when Skipper looks atfct.billings_allocated, it doesn't just see the schema; it sees that this is a pre-joined fact table built fromdim.accounts,dim.customers, andseed.product_classification, with itsalloc_amountcolumn computed asbilled_amount / 12 for annual; billed_amount for monthly. That's the kind of nuance that separates a correct answer from a confidently wrong one." Canonicalised at concepts/code-as-context-for-data-agents. - Curated data-model pages — short human-written documents
surfaced as MCP resources: "Prefer tables tagged 'curated'.
Avoid
scratch_r2and tables tagged 'internal'. Search with data model terms (e.g., 'billing product revenue') not natural language." - Runtime introspection — "When everything else fails,
Skipper can issue live queries to Trino:
DESCRIBE table,SELECT DISTINCT col LIMIT 20,SELECT COUNT(*). It uses these sparingly as runtime context is expensive, but it's the safety net that makes the rest of the system robust."
Code Mode as the MCP integration¶
Skipper exposes its tool surface to the LLM via
Code Mode: instead of defining 30+
individual tools, two tools — search and execute — let
the model write a JavaScript snippet that calls Skipper's full
toolset programmatically:
const datasets = await skipper.search_datasets({ query: "billing product revenue" })
const queryId = await skipper.start_query({ sql: "SELECT ..." })
const results = await skipper.fetch_results({ queryId, mode: "inject" })
return skipper.create_chart({ chartType: "bar", data: results.rows, ... })
The snippet runs in a sandboxed Dynamic Worker isolate via WorkerLoader. Five-tool workflow collapses to one round-trip; "the workflows it produces are auditable as code." Canonicalised at patterns/code-mode-mcp-for-data-agent.
"This is fine, but it is chatty: a five-tool workflow is five model round-trips, each of which has to re-establish context." The argument is structural: code as a single artefact carries the entire workflow.
Distribution surfaces¶
"Skipper meets its users wherever they are."
- A Worker-backed agentic harness powered by Workers AI — the chat-box surface.
- An MCP server — "many of our internal users work via local agentic flows, and Skipper's tools are additionally available via an MCP server."
Security model = data model¶
"Everything Skipper does runs as the calling user. If you don't have access to a table, Skipper can't query it for you. If you ask for PII, your permissions are checked. If a query you save is shared with a teammate, their access is checked at view time, not at save time, because group membership changes."
Canonicalised at concepts/security-model-as-data-model. The agent has no privilege escalation — it inherits exactly the calling user's authority.
Embedded dashboards¶
Single-tag iframe contract:
<div data-skipper-dashboard="dash-123"></div>
<script src="https://skipper.cloudflare.com/embed.js" async></script>
Stack: iframe auto-resize → CSP frame-ancestors (corporate
domain only) → Cloudflare Access
gates iframe contents → underlying-table permission re-check at
view time → fallback link to RBAC group request. Canonicalised at
patterns/embedded-dashboard-with-zero-trust-iframe.
Self-serve permission requests as agent UX¶
If a user queries a table they can't access, the error is "this table needs review, click here to request one" — not "permission denied." Skipper "will even suggest the right RBAC group to request and link you straight to it." Canonicalised at patterns/error-message-as-self-serve-permission-request.
Design lessons (named explicitly in the post)¶
- Less prompting is more — concepts/less-prompting-is-more-for-analytical-agents. "Quality went down." with prescriptive multi-step prompts; high-level guidance + free model agency wins.
- Tool overlap is poison —
concepts/tool-overlap-poisons-agent-accuracy. Three
different
fetch_resultstools became one with amodeparameter (inject/display/both). - Code, not metadata, captures meaning — concepts/code-as-context-for-data-agents. The biggest accuracy wins came from ingesting the SQL that produces a table, not its schema.
- Memory matters more than expected — sibling to concepts/agent-infrastructure-memory (Grafana Assistant). "Without a memory layer, the agent rediscovers and re-learns these every conversation. With one, it gets monotonically better at the recurring questions a team actually asks."
Operational signal¶
- "top 100 customers by revenue" — "about three seconds in Skipper now. So does 'how many domains that signed up today are in the top 100.'"
- Customer-support example: "Find the top 100 billing support tickets from customers who have spent >$100" — "used to be a multi-day project. Now it's a Skipper query."
- The 324 distinct Cloudflare employees running 91,760 Town Lake queries in a recent measurement period are the user base Skipper's UI fans into.
Naming disambiguation¶
Three distinct "Skipper" systems exist in this wiki:
- Skipper (Cloudflare) — this page; the AI data agent on Town Lake.
- Skipper (Airbnb) — Airbnb's embedded workflow engine.
- Skipper (Zalando) — Zalando's HTTP reverse-proxy / route server.
Seen in¶
- sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-it — canonical wiki source.
Related¶
- systems/cloudflare-town-lake — the platform Skipper sits on.
- systems/cloudflare-lifeguard — feeds basic access info to Skipper so users get blocked at the front door.
- systems/cloudflare-transformer-elt — emits the
.meta.jsondocumentation that powers Layer 3 grounded context. - systems/datahub — the metadata catalog Skipper queries.
- systems/code-mode — the MCP code-generation surface.
- systems/dynamic-workers — WorkerLoader runs the generated JavaScript snippets.
- systems/model-context-protocol — MCP-server distribution surface.
- concepts/layered-grounded-context-for-data-agent — the structural shape.
- concepts/security-model-as-data-model — the agent's authority model.
- concepts/code-as-context-for-data-agents — the SQL-as- context insight.
- concepts/tool-overlap-poisons-agent-accuracy — design lesson 2.
- concepts/less-prompting-is-more-for-analytical-agents — design lesson 1.
- concepts/agent-infrastructure-memory — sibling at Grafana.
- patterns/code-mode-mcp-for-data-agent
- patterns/embedded-dashboard-with-zero-trust-iframe
- patterns/error-message-as-self-serve-permission-request
- companies/cloudflare