CONCEPT Cited by 1 source
Implicit concurrent data fetching¶
Definition¶
Implicit concurrent data fetching is an abstraction where the programmer writes code that looks like a series of sequential fetches against one or more data sources, and a framework + compiler together automatically batch same-source fetches into single requests and overlap fetches on independent sources — without any explicit concurrency constructs in the user's code.
The programmer expresses what to fetch and how to combine it. The runtime / compiler express when fetches can share a batch or run in parallel.
Canonical wiki instance: Haxl¶
Haxl is the open-source Haskell framework Meta built to provide implicit concurrent data fetching for its Sigma anti-abuse rule engine. Documented in the ICFP 2014 paper "There is no fork".
The motivation Meta states plainly:
All the existing concurrency abstractions in Haskell are explicit, meaning that the user needs to say which things should happen concurrently. For data-fetching, which can be considered a purely functional operation, we wanted a programming model in which the system just exploits whatever concurrency is available, without the programmer having to use explicit concurrency constructs.
The point is separation of concerns: anti-abuse policy authors write spam-detection logic; scheduling happens elsewhere. A policy author who fetches user profile, link reputation, and graph features in a policy doesn't have to know that the first two can batch and the third runs in parallel — Haxl does that.
The compiler half: Applicative do-notation¶
For the framework to rearrange statements that look imperative, the compiler must distinguish statements that are genuinely sequential (a later one uses an earlier one's result) from statements that are independent (parallelisable).
Meta designed and implemented
Applicative do-notation
in GHC for this purpose. The compiler analyses the
do-block and, where dependencies permit, rearranges the statements
into <*> (applicative) combinators that Haxl can batch and overlap.
Canonical wiki takeaway: implicit concurrency requires compiler co-design, not just a clever library. A framework alone cannot rearrange statements without changing the language — it can at best provide a different syntax. Meta chose to ship a compiler extension to GHC rather than ask every policy author to adopt non-do syntax.
Contrast with explicit concurrency¶
| Axis | Implicit (Haxl-style) | Explicit (forkIO-style) |
|---|---|---|
| Programmer writes concurrency constructs | No | Yes (forkIO, MVar, async) |
| Optimal batching / overlap | Framework decides | Programmer decides |
| Reasoning about races | Pure-functional, no races possible | Hard; requires discipline |
| Best when… | The task is data-flow (fetch, join, fetch) | The task is control-flow (pipelines, servers) |
Implicit concurrency is specifically for data-flow-shaped workloads. A rule engine's "evaluate this policy" is data-flow- shaped: fetch some facts, combine them, emit a decision. A server's "handle this connection" is control-flow-shaped — explicit concurrency is usually the right tool there.
When this concept fits¶
- Data-flow-dominant workloads with multiple backend fetches.
- Domain experts writing the code who are not concurrency experts (anti-abuse engineers, data scientists, business-rule authors).
- The framework can recognise data-source identity well enough to batch same-source requests safely.
When it doesn't fit¶
- Control-flow-heavy code (server loops, pipelines with custom backpressure) — explicit concurrency is clearer and more correct.
- Side-effectful sequential code where the "batch these two" decision is not the framework's to make.
Seen in¶
- sources/2015-06-26-meta-fighting-spam-with-haskell — Haxl + Applicative do-notation underneath Meta's Sigma rule engine; canonical industrial instance.
Related¶
- systems/haxl — the framework.
- systems/ghc — the compiler with Applicative do-notation.
- systems/haskell — the language.
- systems/sigma-meta — the production consumer.
- concepts/purely-functional-policy-language — why implicit concurrency is safe here: pure-functional code has no hidden effects to serialise.