Sources¶
Per-article summaries of ingested engineering blog posts. Most recent first.
178 pages
- Visibility at Scale: How Figma Detects Sensitive Data Exposure — Figma describes Response Sampling, a two-phase security detection
- Figma — The Search for Speed in Figma (OpenSearch) — Figma's search team spent several months debugging and re-tuning the
- Figma — The Infrastructure Behind AI Search in Figma — Infrastructure companion to Figma's earlier product-narrative post on
- Supporting Faster File Load Times with Memory Optimizations in Rust — Figma's Multiplayer server loads
- Figma — Server-side sandboxing — Virtual machines — Part 2 of Figma's security-engineering 3-part series on server-side
- Figma — Server-side sandboxing — Containers and seccomp — Part 3 of Figma's security-engineering 3-part series on server-side
- Rolling Out Santa Without Freezing Productivity: Tips from Securing Figma's Fleet — Figma's Endpoint Security team rolled out Santa —
- Figma Rendering: Powered by WebGPU — Figma's canvas renderer — a C++ codebase compiled to
- Redefining Impact as a Data Scientist (Figma, 2026-04-21) — Figma Engineering post (Data Science author, writing on behalf of the team
- How We Rebuilt the Foundations of Component Instances — Year-long Figma client-architecture rewrite (2025, 15+ contributors) replacing
- A Tale of Two Parameter Architectures — and How We Unified Them — Figma retrospective on unifying the architectures behind its two
- Keeping It 100(x) With Real-time Data At Scale — Figma re-architected LiveGraph — its real-time
- Figma — How We Built AI-Powered Search in Figma — Figma built AI-powered search (shipped at Config 2024) combining
- Figma — How We Built a Custom Permissions DSL at Figma — Figma's engineering team rebuilt permissions enforcement from a
- Figma — How Figma's Databases Team Lived to Tell the Scale — Figma's Databases team retrospective on scaling RDS Postgres ~100× since 2020. 2020 baseline was a single Postgres on AWS's largest physical…
- How Figma Draws Inspiration From the Gaming World — A Figma engineering post (2026-04-21) by a former game-engine engineer
- Figma — Figma's Next-Generation Data Caching Platform — Figma's Storage Products team built FigCache — a stateless,
- Figma — Enforcing Device Trust on Code Changes — Figma's security team adds a cryptographic device-trust check on every
- Cloudflare — Moving past bots vs. humans — Cloudflare argues that the "bots vs. humans" frame is no longer
- Take Control: Customer-Managed Keys for Lakebase Postgres — Databricks launches Customer-Managed Keys (CMK) for systems/lakebase,
- Mercedes-Benz builds a cross-cloud data mesh with Delta Sharing and intelligent replication — Case study from Mercedes-Benz on building a cross-hyperscaler data-sharing
- Cloudflare — Orchestrating AI Code Review at scale — Cloudflare's 2026-04-20 post details a CI-native AI code-review orchestration system built around OpenCode (open-source…
- Cloudflare: The AI Engineering Stack We Built Internally — Cloudflare describes the internal AI engineering stack that reached 93% R&D
- Governing Coding Agent Sprawl with Unity AI Gateway — Databricks announces Coding Agent Support in
- Unweight: how we compressed an LLM 22% without sacrificing quality — Cloudflare introduces Unweight, a lossless compression system
- Shared Dictionaries: compression that keeps up with the agentic web — Cloudflare's 2026-04-17 post announces an open beta opening
- Redirects for AI Training enforces canonical content — Cloudflare's 2026-04-17 post is the dedicated launch of
- Introducing the Agent Readiness score. Is your site agent-ready? — Cloudflare introduces isitagentready.com,
- Introducing Flagship: feature flags built for the age of AI — Cloudflare's 2026-04-17 Agents-Week post launches
- Agents Week: network performance update — Cloudflare's Agents Week 2026 performance update reports that
- Cloudflare — Agents that remember: introducing Agent Memory — Cloudflare's 2026-04-17 post launches Agent Memory (private beta) — an opinionated managed service that extracts information from agent conv…
- GitHub Engineering — How GitHub uses eBPF to improve deployment safety — GitHub hosts its own source code on github.com (they are
- Cloudflare Email Service: now in public beta. Ready for your agents — Cloudflare's 2026-04-16 Agents-Week post moves Email Sending out of private beta into public beta, pair…
- Deploy Postgres and MySQL databases with PlanetScale + Workers — Cloudflare announced the next step of its September-2025
- Building the foundation for running extra-large language models — Cloudflare's 2026-04-16 deep-dive on how Workers AI serves extra-large LLMs like Kimi K2.5 (~1T…
- Artifacts: versioned storage that speaks Git — Cloudflare's 2026-04-16 post launches Artifacts (private beta, public beta by early May 2026) — a distribut…
- Cloudflare AI Search: the search primitive for your agents — Cloudflare's 2026-04-16 post launches AI Search (formerly AutoRAG) as a plug-and-play managed search primit…
- Cloudflare's AI Platform: an inference layer designed for agents — A 2026-04-16 Agents-Week post positioning Cloudflare as a unified
- Atlassian — Streaming Server-Side Rendering in Confluence — Atlassian's Confluence team adopted React 18 streaming SSR as the second
- Airbnb — Building a high-volume metrics pipeline with OpenTelemetry and vmagent — Companion piece to Airbnb's in-house metrics migration (see
- Project Think: building the next generation of AI agents on Cloudflare — Cloudflare announced Project Think (2026-04-15, published
- Introducing Agent Lee - a new interface to the Cloudflare stack — Cloudflare launched Agent Lee, an in-dashboard AI assistant that
- Airbnb: Privacy-first connections — Empowering social experiences — -53c7c27702d5---4
- Building a CLI for all of Cloudflare — Cloudflare announced a Technical Preview of the next-generation Wrangler CLI — installable today
- Build a multi-tenant configuration system with tagged storage patterns — AWS Architecture Blog walkthrough of a multi-tenant configuration
- MongoDB Predictive Auto-Scaling: An Experiment — MongoDB Engineering retrospective on the 2023 internal research
- How we built a real-world evaluation platform for autonomous SRE agents at scale — Datadog's retrospective on building the offline, replayable evaluation
- Cloudflare targets 2029 for full post-quantum security — Cloudflare publishes an updated Q-Day risk
- AWS News Blog: Launching S3 Files, making S3 buckets accessible as file systems — The AWS News Blog launch announcement for [[systems/s3-files|Amazon
- All Things Distributed: S3 Files and the changing face of S3 — Andy Warfield (VP/DE, S3) announces systems/s3-files, a new S3
- AWS Architecture Blog — Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod — Product-announcement post for the SageMaker HyperPod Inference
- The uphill climb of making diff lines performant — GitHub Engineering describes the year+ rewrite of the Files changed
- Dropbox — Improving storage efficiency in Magic Pocket, our immutable blob store — Dropbox's Magic Pocket storage team hit an
- AWS Architecture Blog — Automate safety monitoring with computer vision and generative AI — AWS Architecture Blog retrospective on a serverless, event-driven
- Google Research — Safeguarding cryptocurrency by disclosing quantum vulnerabilities responsibly — Google Research lays out its disclosure philosophy for the 2026
- AWS Architecture Blog — Streamlining access to powerful disaster recovery capabilities of AWS — Survey-style AWS Architecture Blog post positioning AWS's DR building
- Architecting for agentic AI development on AWS — AWS Architecture Blog prescriptive essay on how to architect AWS
- Dropbox — Reducing our monorepo size to improve developer velocity — Dropbox's server-side monorepo — a single
- Expedia — Operating Trino at Scale With Trino Gateway — Expedia Group's data-platform team writes up their production use of
- Datadog — When upserts don't update but still write: debugging Postgres performance at scale — Datadog's host-metadata team added an INSERT ... ON CONFLICT DO UPDATE upsert to track
- How Generali Malaysia optimizes operations with Amazon EKS — Generali Malaysia — one of Malaysia's largest general insurers, part
- AI-powered event response for Amazon EKS — AWS Architecture Blog product post on AWS DevOps Agent, a fully
- How we optimized Dash's relevance judge with DSPy — Dropbox Tech post on how the [[systems/dash-relevance-ranker|Dash
- Airbnb: From vendors to vanguard — hard-won lessons in observability ownership — -53c7c27702d5---4
- Airbnb: Recommending travel destinations to help users explore — -53c7c27702d5---4
- When an AI agent came knocking: Catching malicious contributions in Datadog's open source repos — Datadog Engineering retrospective (2026-03-09) on how Datadog's
- Designing MCP tools for agents: Lessons from building Datadog's MCP server — Datadog's retrospective on shipping its official
- Airbnb: It wasn't a culture problem — upleveling alert development at Airbnb — -53c7c27702d5---4
- How we rebuilt the search architecture for high availability in GitHub Enterprise Server — GitHub Engineering describes a year-long rewrite of the
- Towards Model-based Verification of a Key-Value Storage Engine — MongoDB's Part 2 follow-up to the 2026-02 distributed-transactions
- We deserve a better streams API for JavaScript — James Snell — Cloudflare Workers runtime engineer, Node.js TSC member, multi-runtime implementer of the
- Using LLMs to amplify human labeling and improve Dash search relevance — Dropbox Tech post on how Dash trains the
- AWS: Digital Transformation at Santander — How Platform Engineering is Revolutionizing Cloud Infrastructure — Joint AWS × Santander architecture-blog post on Catalyst,
- 6,000 AWS accounts, three people, one platform: Lessons learned (AWS Architecture Blog, 2026-02-25) — ProGlove (smart-wearable barcode scanners for frontline workers) runs its
- How we rebuilt Next.js with AI in one week (vinext) — Cloudflare's 2026-02-24 post announces vinext —
- How we reduced the size of our Agent Go binaries by up to 77% — Datadog Engineering retrospective (2026-02-18) on how the
- Airbnb Sitar: Safeguarding dynamic configuration changes at scale — -53c7c27702d5---4
- Expedia — Interleaving for Accelerated Testing (2026-02-17) — -38998a53046f---4
- A chat with Byron Cook on automated reasoning and trust in AI systems — Werner Vogels interviews Byron Cook (Amazon Distinguished Scientist + VP) three and a half years after their first conversation on automated…
- How low-bit inference enables efficient AI — Dropbox's ML team surveys the low-bit inference landscape —
- Google Research — Scheduling in a changing world: Maximizing throughput with time-varying capacity — Google Research post (2026-02-11) on online throughput-
- AWS: How Convera built fine-grained API authorization with Amazon Verified Permissions — AWS Architecture Blog post by the Amazon Verified Permissions and Convera
- AWS: Mastering millisecond latency and millions of events — the event-driven architecture behind the Amazon Key Suite — AWS Architecture Blog post by the Amazon Key team on modernizing their
- Sovereign failover — Design for digital sovereignty using the AWS European Sovereign Cloud — Architectural companion to the 2026-01-16 AWS European Sovereign Cloud
- Cloudflare — Moltworker: a self-hosted personal AI agent, minus the minis — Cloudflare ports Moltbot (formerly Clawdbot; later renamed OpenClaw
- Dropbox: VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash — Edited + condensed version of a talk Josh Clemm (VP of Engineering for
- What came first: the CNAME or the A record? — Cloudflare post-mortem on the ~40-minute partial global outage of
- When protections outlive their purpose: A lesson on managing defense systems at scale — GitHub's Traffic team published a short engineering postmortem on a
- Open Sourcing Dicer: Databricks' Auto-Sharder — Databricks open-sourced Dicer, the auto-sharder that underlies "every major Databricks product". Dicer is an intelligent control plane that …
- How Salesforce migrated from Cluster Autoscaler to Karpenter across their fleet of 1,000 EKS clusters — AWS Architecture Blog case study (2026-01-12) documenting Salesforce's mid-2025→early-2026 migration of its Kubernete…
- A closer look at a BGP anomaly in Venezuela — Cloudflare forensic post responding to a cybersecurity newsletter that
- Hardening eBPF for runtime security: Lessons from Datadog Workload Protection — Datadog Workload Protection's 5-year retrospective on running eBPF
- Expedia — Powering Vector Embedding Capabilities — Expedia Group's ML Platform team describes the Embedding Store
- MongoDB Server Security Update, December 2025 — On 2025-12-12 at 19:00 ET, MongoDB's Security Engineering team
- MongoDB (Voyage AI) — Token-count-based Batching: Faster, Cheaper Embedding Inference for Queries — Voyage AI by MongoDB describes the production embedding-inference
- Inside the feature store powering real-time AI in Dropbox Dash — Dropbox built an internal feature store to power ranking in
- Architecting conversational observability for cloud applications — AWS Architecture Blog reference-architecture post (2025-12-11) for a
- Cloudflare outage on December 5, 2025 — On 2025-12-05 at 08:47 UTC, a portion of Cloudflare's network
- How We Debug 1000s of Databases with AI at Databricks — Databricks built an internal AI agent platform (Storex) that unifies database investigation across a fleet of thousands of database instance…
- The local-first rebellion: How Home Assistant became the most important project in your house (GitHub Blog, 2025-12-02) — GitHub Blog (Open Source / Maintainers column) profile of Franck "Frenck"
- Secure Amazon Elastic VMware Service (Amazon EVS) with AWS Network Firewall — AWS Architecture Blog reference-architecture post on how to deploy a
- Scaling real-time file monitoring with eBPF: How we filtered billions of kernel events per minute — Datadog's File Integrity Monitoring (FIM) team describes how they
- Dropbox: How Dash uses context engineering for smarter AI — Dropbox's ML team describes the context-engineering evolution of
- Expedia — Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters! — Expedia debugs an in-production Kafka Streams application that consumed
- Google Research — DS-STAR: A state-of-the-art versatile data science agent — Google Research introduces DS-STAR — a data-science
- Google Research — Exploring a space-based, scalable AI infrastructure system design — Google Research announces Project Suncatcher — a moonshot
- Replication redefined: How we built a low-latency, multi-tenant data replication platform — Datadog Engineering retrospective (2025-11-04) on building the
- Immutable releases are now generally available on GitHub — GitHub announced the general availability of immutable releases on
- Toward provably private insights into AI use (Google Research, 2025-10-30) — Google Research introduces Provably Private Insights (PPI): a
- What is USSD (and who cares)? — Werner Vogels writes a short thesis piece on systems/ussd
- Google Research — Solving virtual machine puzzles: How AI is optimizing cloud computing — Google Research post (2025-10-17) introducing a trio of
- Google Research — Coral NPU: A full-stack platform for Edge AI — Google Research introduces Coral NPU as a
- Cloudflare — Unpacking Cloudflare Workers CPU Performance Benchmarks — Public-response post to Theo Browne's 2025-10-04
- Cars24 Improves Search For 300 Million Users With MongoDB Atlas — MongoDB-Blog case study of Cars24 — Indian multinational online
- MongoDB — The Cost of Not Knowing MongoDB, Part 3: appV6R0 to appV6R4 — Third and final installment of MongoDB's senior-developer-authored case
- Google Research — Speech-to-Retrieval (S2R): A new approach to voice search — Google Research introduces [[systems/speech-to-retrieval|Speech-to-Retrieval
- Intelligent Kubernetes Load Balancing at Databricks — Databricks replaced Kubernetes' default L4 kube-proxy load balancing with an in-house proxyless, client-side L7 load-balancing system backed…
- MongoDB — Top Considerations When Choosing A Hybrid Search Solution — MongoDB 2025-09-30 technical-blog post (author implicit; MongoDB product-marketing flavour but with legitimate architectural content) survey…
- Expedia — Why You Should Prefer MERGE INTO Over INSERT OVERWRITE in Apache Iceberg — A short Expedia Group Tech post arguing that on systems/apache-iceberg
- MongoDB — From Niche NoSQL To Enterprise Powerhouse: The Story Of MongoDB's Evolution — A 2025-09-25 MongoDB Engineering blog post by Ashish Agrawal (joined MongoDB ~2023 via the Grainite acquisition; prior ~decade at Google on …
- MongoDB — Carrying Complexity, Delivering Agility — A 2025-09-25 MongoDB engineering-leadership manifesto co-authored by
- Build AI Agents Worth Keeping: The Canvas Framework — MongoDB-Blog thought-leadership post diagnosing why so many
- Cloudflare — Cap'n Web: a new RPC system for browsers and web servers — Kenton Varda (author of Cap'n Proto and Cloudflare's
- MongoDB Community Edition to Atlas: A Migration Masterclass with BharatPE — MongoDB-Blog case study of BharatPE — Indian fintech processing
- MongoDB — Modernizing Core Insurance Systems: Breaking The Batch Bottleneck — MongoDB authors a framework-level retrospective on post-migration batch-job
- Google Research — Making LLMs more accurate by using all of their layers (SLED) — Google Research introduces SLED (Self Logits
- Post-quantum security for SSH access on GitHub — GitHub announced (effective 2025-09-17) the addition of
- Google Research — Speculative cascades: A hybrid approach for smarter, faster LLM inference — Google Research frames speculative cascades as a unified
- Google Research — From massive models to mobile magic: The tech behind YouTube real-time generative AI effects — Google Research describes the training-to-serving pipeline behind
- Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet — Dropbox's seventh-generation in-house server hardware — replacing the 2020-era sixth-gen Cartman platform — rolled out across five named tie…
- All Things Distributed — Removing friction from Amazon SageMaker AI development — Werner Vogels surveys four recent SageMaker AI capabilities released to
- Google Research — Simulating large systems with Regression Language Models — Google Research post (2025-07-29) proposing text-to-text regression
- Google Research — Android Earthquake Alerts: A global system for early warning — Google Research post (2025-07-17) on the Android Earthquake Alerts
- Datadog — How we tracked down a Go 1.24 memory regression across hundreds of pods — Datadog rolled Go 1.24 to a data-processing service across hundreds of Kubernetes pods and observed a ~20% RSS increase that did not appear …
- Cloudflare 1.1.1.1 incident on July 14, 2025 — Cloudflare post-mortem on the 62-minute global outage of the
- AWS — Introducing Amazon S3 Vectors: First cloud storage with native vector support at scale (preview) — Channy Yun (AWS News Blog, 2025-07-16) announces the preview of
- Cloudflare: Introducing pay per crawl — Enabling content owners to charge AI crawlers for access — Cloudflare announces Pay Per Crawl (private beta, 2025-07-01), a framework
- Google Research — How we created HOV-specific ETAs in Google Maps — Google Research post (2025-06-30) announcing a Google Maps feature:
- Defending the Internet: how Cloudflare blocked a monumental 7.3 Tbps DDoS attack — Cloudflare recounts autonomously blocking a 7.3 Tbps / 4.8 Bpps
- MongoDB — Conformance Checking at MongoDB: Testing That Our Code Matches Our TLA+ Specs — A. Jesse Jiryu Davis's 2025 retrospective (from 2025's perspective) on the
- Just make it scale: An Aurora DSQL story (Werner Vogels, guest-authored by Niko Matsakis & Marc Bowes) — Werner Vogels hosts a guest post by Sr. Principal Engineers Niko Matsakis (a core Rust language designer) and Marc Bowes on the engineering …
- GitHub Issues search now supports nested queries and boolean operators: Here's how we (re)built it — GitHub rewrote Issues search to support logical AND/OR operators
- Understanding transaction visibility in PostgreSQL clusters with read replicas — AWS's response to Jepsen's 2025-04-29 report on transaction visibility in Amazo…
- Open-sourcing OpenPubkey SSH (OPKSSH): integrating single sign-on with SSH — Cloudflare announces the open-sourcing of OPKSSH (OpenPubkey SSH)
- Sign in as anyone: Bypassing SAML SSO authentication with parser differentials — Two GitHub Security Lab researchers (Peter Stöckli + an external bug-bounty
- All Things Distributed: In S3 simplicity is table stakes (S3 at 19) — On S3's 19th birthday (Pi Day 2025), Andy Warfield (VP / Distinguished
- Building and operating a pretty big storage system called S3 — Author: Andy Warfield (VP / Distinguished Engineer, S3), guest post hosted by Werner Vogels on All Things Distributed. Based on Warfield's U…
- We Were Wrong About GPUs — Retrospective / course-correction post by Thomas Ptacek on
- Fly.io — The Exit Interview: JP Phillips — Exit-interview blog post (2025-02-12) with JP Phillips, the
- Fly.io — VSCode's SSH Agent Is Bananas — Fly.io's 2025-02-07 opinion post on VSCode Remote-SSH's architecture
- Datadog — Husky: Efficient compaction at Datadog scale — Third post in Datadog's Husky series (after introducing-husky and the
- AWS — Migrating from AWS App Mesh to Amazon ECS Service Connect (App Mesh discontinuation announcement) — AWS announces the end-of-life of AWS App Mesh
- Scaling Large Language Models for e-Commerce: The Development of a Llama-Based Customized LLM — eBay's 2025-01-17 post describes the training-infrastructure and data-mix design behind e-Llama — 8-billion and 70-billion parameter LLMs ad…
- Google Research — Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken (2006, republished 2025-01-11) — Joshua Bloch's 2006 Google Research blog post — republished on the
- Faster continuous integration builds at Canva — Canva's Developer Platform group cut the average PR-to-merge CI time from
- Canva: The science of routing print orders — Canva's Print team built a configurable rule-driven routing engine that
- Prevent factual errors from LLM hallucinations with mathematically sound Automated Reasoning checks (preview) — The preview-launch announcement for Automated Reasoning checks as a new safeguard in [[systems/bedrock-guardrails-automated-reasoning-checks…
- All Things Distributed: AWS Lambda turns 10 — a rare look at the PR/FAQ that started it — Werner Vogels publishes the (lightly edited) internal PR/FAQ that launched
- What's new with Robinhood, our in-house load balancing service — Robinhood is Dropbox's in-house internal-traffic load balancing service, deployed since 2020 and rebuilt in 2023 around PID-controller-drive…
- AI GPU Clusters, From Your Laptop, With Livebook — Fly.io's 2024-09-24 recap of Chris McCord's and Chris Grainger's
- Cloudflare — A good day to trie-hard: saving compute 1% at a time — Cloudflare's pingora-origin service — the last Rust-proxy hop before a
- Google Research — SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL — Google's research paper (published 2024-08-24, surfaced on Hacker News at
- Continuous reinvention: A brief history of block storage at AWS (Marc Olson, guest post on Werner Vogels' blog) — Marc Olson, a ~13-year veteran of the EBS team, narrates EBS's arc from a 2008 HDD-backed shared-disk service into a distributed SSD fleet d…
- We're Cutting L40S Prices In Half — Pricing-announcement post for NVIDIA L40S GPUs
- Figma: How We Migrated onto K8s in Less Than 12 Months — Figma migrated its core compute platform from AWS ECS on EC2 to
- Fly.io — Making Machines Move — Fly.io's 2024-07-30 engineering post on the year-long rebuild of
- Amazon's Exabyte-Scale Migration from Apache Spark to Ray on Amazon EC2 — Amazon Retail's Business Data Technologies (BDT) team is in the
- Fly.io — AWS without Access Keys — Fly.io's 2024-06-19 post (oidc-cloud-roles)
- Dropbox — Testing sync at Dropbox (2020) — Isaac Goldberg's (Dropbox) walkthrough of the testing strategy that allowed
- Dynamic loading of real-time content at Figma — Figma extended its per-page dynamic loading system — already used
- Google Research — VideoPrism: A foundational visual encoder for video understanding — Google Research introduces VideoPrism, a video foundation
- Figma's journey to TypeScript — compiling away our custom programming language — Figma migrated the entire Skew codebase underlying its prototype viewer
- Canva — Scaling to Count Billions — Canva's Creators-payment pipeline counts billions of content-usage events
- Figma — Speeding Up C++ Build Times — Figma's Core team retrospective on cutting C++ cold-build times ~50% after a
- Fly.io — JIT WireGuard — Fly.io's 2024-03-12 post on replacing push-based WireGuard peer
- Fly.io — Fly Kubernetes does more now (FKS beta) — Fly.io announces the beta of Fly Kubernetes (FKS) — their
- Fly.io — Globally Distributed Object Storage with Tigris — Fly.io's 2024-02-15 public-beta announcement for Tigris, a
- How Figma's multiplayer technology works — Figma's 2019 post (republished 2025-08-16 on HN, surfacing 4 years after