Octopus Energy¶
Octopus Energy is a UK-headquartered retail energy supplier with 8M+ customers, known for time-of-use smart tariffs that pass the half-hourly wholesale price signal through to consumers (electric vehicles charging when the grid is cheapest/cleanest, heat-pump optimisation, etc.). They are not an engineering-blog publisher in the sysdesign-wiki sense — their presence on the wiki comes via a 2026-05-23 Databricks customer-success post covering the margin data pipeline rebuild for the UK regulatory regime change to Market-wide Half-Hourly Settlement (MHHS).
Why on the wiki¶
The Octopus Energy case is interesting for system-design readers on three axes:
- Regulatory event multiplying data volume at a finer grain is a generalisable forcing function, not unique to energy. MHHS moves every UK household from 2 meter reads per month → 48 reads per day — a 48× increase in data points driving every margin and settlement calculation. The same dynamics apply "any time a system moves from monthly to daily, daily to real-time, or aggregate to transactional". (Source: sources/2026-05-23-databricks-scaling-for-mhhs-octopus-energy-50x-cost-reduction)
- Grain misalignment as the central cost driver — the legacy pipeline ran everything at a single (monthly) grain because billing was monthly; MHHS introduced a fundamental split (industry settlement = half-hourly, smart-tariff revenue = half-hourly, standard-tariff revenue = monthly), and processing all three through one pipeline meant "processing the entire dataset on every run, regardless of what had actually changed". Canonicalised on the wiki as concepts/grain-misalignment.
- Regulatory rebuild that ended cheaper than the legacy. The post-rebuild cost per settlement date is $0.48 — not just ~50× below the projected MHHS cost of $23.63, but 2× below the legacy ($0.71) despite processing 48× more data points. Annualised cost avoidance: ~$1M, with the upstream-incremental savings additional. Three engineers, three months. The architectural lesson is that "you can't just throw more compute at a problem like this. You have to rebuild and rethink your logic from the ground up" (Saad Ali, Lead of the Margin Data Team).
Key systems¶
- systems/octopus-margin-data-pipeline — the rebuilt three-stream margin data pipeline. Settlement (half-hourly for regulatory cost allocation) + Half-Hourly (smart-tariff customers — EVs, heat pumps, time-of-use products) + Monthly (standard tariffs), orchestrated by a "Job of Jobs" pattern, sitting on a unified multi-terabyte multi-grain source-of-truth layer that consolidates meter reads, smart meter data, and industry flows. Built on Delta Lake / Apache Spark / Databricks Serverless. First and only wiki disclosure as of 2026-05-23.
Key patterns / concepts¶
- concepts/grain-misalignment — when a pipeline's grain is finer than the natural grain of every consumer, you pay the finest-grain cost on every run regardless of what changed. Canonicalised from this case.
- concepts/data-pipeline-grain — the per-stream natural grain framework: settlement / smart-tariff / standard-tariff each have a different natural grain, and stream-per-grain is the resolution.
- concepts/remove-before-add-optimization — "removing unjustified compute operations was as impactful as adding new optimisations". Z-ordering, ANALYZE, custom shuffle logic that predate measurement-based justification are the most common removal targets.
- patterns/grain-aligned-stream-split — the architectural pattern of replacing one monolithic finest-grain pipeline with N streams, one per natural grain.
- patterns/cdf-incremental-replacing-full-rescan — the Delta-CDF-based incremental processing that delivered the single largest win (25 B → 300 M rows / run, 98.8% reduction, weekly → daily freshness).
- patterns/broadcast-join-for-small-reference-tables — Spark optimisation pattern, <500 MB threshold disclosed.
- patterns/job-of-jobs-orchestration — orchestrate the three streams while preserving each one's independent tuning profile.
Recent articles¶
- 2026-05-23 — "Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering" (via Databricks Blog) — three-stream re-architecture by natural grain + Delta CDF for upstream incremental processing + Spark/AQE optimisation + serverless-as-development-velocity. $0.48 / settlement date, ~$1M annualised cost avoidance, 3 months, team of three. → sources/2026-05-23-databricks-scaling-for-mhhs-octopus-energy-50x-cost-reduction
Notable people quoted on the wiki¶
- Saad Ali — Lead of the Margin Data Team. "You can't just throw more compute at a problem like this. You have to rebuild and rethink your logic from the ground up." (Source: sources/2026-05-23-databricks-scaling-for-mhhs-octopus-energy-50x-cost-reduction)