PATTERN Cited by 2 sources
Reproducible benchmark publication¶
Intent¶
When publishing benchmark numbers — especially vendor-produced benchmarks that compare your product to competitors — make the methodology auditable and falsifiable by shipping three artifacts alongside the results:
- Exact configs: instance types, workload parameters, client shapes, Postgres / MySQL / engine-specific knobs, IOPS caps, RAM:CPU ratios.
- Reproduction instructions: a step-by-step guide
(often a URL path like
/benchmarks/instructions/...) that lets a third party rerun the same benchmark. - A feedback address: a dedicated email or issue tracker
where other vendors and engineers can challenge the
methodology. ("We invite other vendors to provide
feedback. If you see anything wrong in our benchmarking
methodology, let us know at
benchmarks@planetscale.com.")
Without all three, a vendor benchmark is marketing with graphs. With all three, it becomes a bounded, audit-able claim that competitors can refute on the record.
Context¶
Vendor benchmarks have an unavoidable credibility problem. The vendor publishes results that flatter its own product — the structural incentive is obvious. Audiences discount accordingly, and the whole category of benchmark comparison becomes noise.
The fix is not to stop publishing vendor benchmarks. The fix is to shift from "trust us" to "rerun this and refute us." Reproduction instructions + a feedback address change the game-theoretic structure:
- If competitors disagree with the methodology, they can respond on the record — not just mutter in sales calls.
- If independent engineers rerun the benchmark and get different numbers, the vendor is publicly accountable.
- The methodology itself becomes an artifact that outlasts the specific results — refined over time by adversarial review.
This is the same mechanism that makes academic experiments credible: replication is the thing that distinguishes evidence from assertion.
Mechanism¶
Publish the harness + configs, not just the results¶
Don't emit just a bar chart. Emit:
- Instance types + region: "AWS c6a.xlarge (4 vCPU, 8 GB Memory) in us-east-1" — verbatim, not paraphrased.
- Engine-specific parameters: "All Postgres configuration options are left at each platform's defaults. The one exception to this is modifications to connection limits and timeouts, which may be modified to facilitate benchmarking."
- Competitor-specific normalisations: where RAM:CPU ratios differ across vendors, spell out whether you matched RAM or CPU, and why — including explicit acknowledgment of which way the choice skews ("this is an unfair advantage to them").
- Availability-posture equalisation: if the reference product has replicas and the competitor doesn't, either (a) add replicas to the competitor cost model, or (b) disclose the asymmetry. Don't silently elide it.
Ship reproduction instructions at a stable URL¶
PlanetScale's 2026-04-21 disclosure uses
/benchmarks/instructions/tpcc500g and
/benchmarks/instructions/oltp300g — permanent URL paths
independent of the post date. Anyone can rerun the same
Percona sysbench-tpcc scripts at the same TABLES=20,
SCALE=250 scale on the same instance shapes.
Commit to a dedicated feedback channel¶
A dedicated email address — not a general support queue.
PlanetScale's benchmarks@planetscale.com signals:
"Methodology feedback is a first-class concern with a known
responder." The channel itself is load-bearing — it's the
mechanism by which refutations make it into subsequent
iterations of the benchmark.
Acknowledge the caps the methodology can't remove¶
Not every bias can be engineered around. PlanetScale's disclosure explicitly names the caps it accepts:
Except for the Latency benchmarks, we do not provide guarantees down to the availability-zone level. Not all platforms allow you to specify which AZ the database node should reside in (nor do they expose this). Thus, it is impractical to make guarantees around this for all providers.
Making the limits of the methodology visible is part of the reproducibility claim. The reader can then decide whether the residual bias is acceptable for their decision.
Canonical instance — PlanetScale Telescope (2026-04-21)¶
From the 2026-04-21 Benchmarking Postgres post:
- Three benchmarks: latency (
SELECT 1;), TPCC-like (Perconasysbench-tpccat ~500 GB), OLTP read-only (sysbench oltp_read_onlyat 300 GB). - Full specs published:
i8g M-320reference (4 vCPU, 32 GB, 937 GB NVMe) with primary + 2 replicas across 3 AZs; client shapes (c6a.xlarge/e2-standard-4); competitor matches; IOPS policies. - Reproduction instructions at
/benchmarks/instructions/tpcc500gand/benchmarks/instructions/oltp300g. - Feedback address:
benchmarks@planetscale.com. - Methodology-voice acknowledgment of limits: "benchmarking of any kind has its shortcomings… no single benchmark can capture the performance characteristics of all such databases."
The systems/telescope-planetscale harness is the artifact; the post is the disclosure. The pattern's novelty on the wiki is the combined shipping: harness + configs + instructions + feedback address + self-aware limits.
Second canonical instance — Meta DCPerf (2024-08-05)¶
DCPerf goes further: the entire benchmark suite is open-source on GitHub. This is the infrastructure- altitude analogue of the database-altitude PlanetScale instance. Same mechanism (reproduction + feedback), different consumer population (CPU vendors + hyperscale capacity planners, not managed-database customers).
Meta's explicit motivation: "an industry standard method to capture important workload characteristics of compute workloads that run in hyperscale datacenter deployments." Open-sourcing the suite turns a point-solution into a coordination tool between hyperscalers, hardware vendors, and academic researchers.
Anti-patterns¶
- Bar chart only. Publishing performance numbers without the configs that produced them is marketing, not benchmarking.
- "Contact sales for details." If the methodology is behind a paywall or a call, the claim isn't auditable.
- Unstable reproduction URLs. Instructions at
/blog/2026-benchmark-v3-final-rev-2signal that the methodology isn't expected to be reused. - No feedback channel. A claim with no way to be refuted rewards the original publisher asymmetrically.
- Silent methodology updates. Results changing without a corresponding changelog of why erodes trust.
- Competitor-specific advantages hidden. If you gave Supabase double the CPU to match RAM, say so and frame the decision — don't hide it in a footnote.
- Apples-to-oranges comparisons with the ratio dressed up. Comparing reserved-instance pricing on one vendor to on-demand pricing on another without calling it out is bias-by-accounting.
Anti-pattern specific to price-performance¶
- Stripping competitor replicas from the cost model. A 1-node competitor looks cheap next to a 3-replica reference — but the comparison is of different products. Either add replicas to the competitor's price, or reduce the reference to 1 node and acknowledge the availability degradation.
Seen in¶
- sources/2026-04-21-planetscale-benchmarking-postgres —
canonical wiki instance at the database-benchmarking altitude.
Telescope harness,
three-benchmark commit, full specs, reproduction URLs,
benchmarks@planetscale.comfeedback address, explicit limits acknowledgment. - sources/2024-08-05-meta-dcperf-open-source-benchmark-suite — infrastructure-altitude instance. Entire benchmark suite open-sourced; IPC + frequency representativeness graphs published; industry-standard ambition explicit.
Related¶
- patterns/custom-benchmarking-harness — prerequisite pattern. Telescope is a custom harness; this pattern adds the "publish it auditably" layer on top.
- patterns/workload-representative-benchmark-from-production — prerequisite pattern. DCPerf is both workload-representative and reproducibly published.
- patterns/measurement-driven-micro-optimization — the broader discipline. Reproducibility closes the loop: the community can re-run your benchmark and surface counter-evidence that feeds your next iteration.
- concepts/benchmark-methodology-bias — the failure mode this pattern mitigates by exposing the methodology to adversarial review.
- concepts/benchmark-representativeness — the property whose validation depends on reproducibility.
- concepts/price-performance-ratio — the accounting dimension that especially benefits from published methodology (cost comparisons are even more vulnerable to framing than raw performance).
- systems/telescope-planetscale — the canonical instance's harness.
- systems/dcperf — the sibling infrastructure-altitude instance.