Skip to content

CONCEPT Cited by 1 source

Remove before add optimization

Remove before add optimization is the architectural principle of auditing existing optimisation choices for measured benefit before adding new ones. Optimisations that pre-date or outlived measurement become technical debt: they have a maintenance cost, they obscure the planner's view of the workload, and they can actively cost more compute than they save. The Octopus Energy MHHS rebuild names the principle explicitly and treats deletion as a co-equal optimisation lever to addition. (Source: sources/2026-05-23-databricks-scaling-for-mhhs-octopus-energy-50x-cost-reduction)

The principle

"That last point bears emphasis: removing unjustified compute operations was as impactful as adding new optimisations. If you are running Z-ordering or ANALYZE without measuring their effect, they may be costing you more than they are saving."

Two operative claims:

  1. Removal is impactful. It is not a cleanup task to do after the real performance work — it is the performance work. The Octopus rebuild treated removal as one of four named optimisation categories alongside lineage simplification, join tuning, and trusting AQE.
  2. The threshold is measurement, not vintage. It does not matter how old or established an optimisation is. If it does not produce a measured benefit on the current workload, it is a candidate for removal.

Why each "always-on" optimisation is suspect

Three named removal targets in the Octopus source:

Target Why removal is plausible
Z-ordering Was the right answer for many query patterns under traditional Delta partitioning; Liquid Clustering now subsumes its role for filter/join columns. Z-ordering compute keeps running even when the workload no longer benefits.
ANALYZE Stats collection that may be redundant once AQE runtime stats are good enough; the cost of running ANALYZE on every table on every refresh adds up.
Custom shuffle logic / hand-tuned hints Encoded the architect's guess about runtime distribution at design time. AQE has access to better, real-time signals. The hand-coded path becomes net-negative when AQE's path is shorter.

Each of these was correct at some point. The principle is not that they were wrong, but that the workload, the engine, and the storage layer all evolve, and an optimisation justified at year zero may not be justified at year three.

The mechanism: trust the optimiser

The companion principle is trust the optimiser: AQE (and equivalent runtime optimisers) has access to runtime statistics — post-shuffle row counts, observed skew, post-filter join input sizes — that the human writing optimisation hints at design time does not. When the runtime optimiser has more information, removing the hint is the higher-quality move.

The Octopus rebuild's verbatim framing:

"In several cases, Spark's Adaptive Query Execution (AQE) outperformed hand-tuned logic. The team removed custom optimisation code and let AQE do its job."

The action item is deletion, not addition. See systems/spark-aqe for the longer treatment.

Implementation: measurement is the gating function

The principle has a hard implementation requirement: without measurement, neither retention nor removal is justifiable. The Octopus rebuild explicitly frames this:

"If you are running Z-ordering or ANALYZE without measuring their effect, they may be costing you more than they are saving."

Two operational moves to make this practical:

  • Side-by-side run comparison — the Databricks Serverless UI enables this for the Octopus team: "making it practical to isolate the effect of individual optimisations".
  • Document the measurement that justified each optimisation. Without a record, the next architect cannot tell whether the optimisation is still earning its place.

A counter-instinct stance

Most performance-engineering culture rewards adding optimisations: elaborate hand-tuning is the mark of an experienced operator; deleting hints feels like backsliding. The remove-before-add principle inverts this:

Instinct Reframing
"More optimisations = better" More optimisations = more hypotheses, each with a maintenance cost
"This optimisation has been here for years; it must work" Vintage is not evidence; measurement is
"The runtime optimiser is opaque, so we should hand-tune" The runtime optimiser sees the workload; the hand-tuner sees the design
"Adding a hint is safe" Adding a hint blocks the planner's better choice

Composition with the other Octopus rebuild principles

Remove-before-add doesn't stand alone. It pairs with:

The Octopus team named all four takeaways together as transferable:

  1. "Grain misalignment is the hidden cost driver."
  2. "Incremental processing transforms pipeline economics."
  3. "Remove before you add."
  4. "Trust the optimiser."

Seen in

  • sources/2026-05-23-databricks-scaling-for-mhhs-octopus-energy-50x-cost-reduction — canonical disclosure. Octopus Energy's MHHS-driven rebuild named "removing unjustified compute operations was as impactful as adding new optimisations" as one of four optimisation categories. Z-ordering and ANALYZE named as common measurement-without-justification removal targets; custom shuffle logic deleted in favour of AQE.
Last updated · 542 distilled / 1,571 read