Skip to content

CONCEPT Cited by 1 source

Implicit concurrent data fetching

Definition

Implicit concurrent data fetching is an abstraction where the programmer writes code that looks like a series of sequential fetches against one or more data sources, and a framework + compiler together automatically batch same-source fetches into single requests and overlap fetches on independent sources — without any explicit concurrency constructs in the user's code.

The programmer expresses what to fetch and how to combine it. The runtime / compiler express when fetches can share a batch or run in parallel.

Canonical wiki instance: Haxl

Haxl is the open-source Haskell framework Meta built to provide implicit concurrent data fetching for its Sigma anti-abuse rule engine. Documented in the ICFP 2014 paper "There is no fork".

The motivation Meta states plainly:

All the existing concurrency abstractions in Haskell are explicit, meaning that the user needs to say which things should happen concurrently. For data-fetching, which can be considered a purely functional operation, we wanted a programming model in which the system just exploits whatever concurrency is available, without the programmer having to use explicit concurrency constructs.

The point is separation of concerns: anti-abuse policy authors write spam-detection logic; scheduling happens elsewhere. A policy author who fetches user profile, link reputation, and graph features in a policy doesn't have to know that the first two can batch and the third runs in parallel — Haxl does that.

The compiler half: Applicative do-notation

For the framework to rearrange statements that look imperative, the compiler must distinguish statements that are genuinely sequential (a later one uses an earlier one's result) from statements that are independent (parallelisable).

Meta designed and implemented Applicative do-notation in GHC for this purpose. The compiler analyses the do-block and, where dependencies permit, rearranges the statements into <*> (applicative) combinators that Haxl can batch and overlap.

Canonical wiki takeaway: implicit concurrency requires compiler co-design, not just a clever library. A framework alone cannot rearrange statements without changing the language — it can at best provide a different syntax. Meta chose to ship a compiler extension to GHC rather than ask every policy author to adopt non-do syntax.

Contrast with explicit concurrency

Axis Implicit (Haxl-style) Explicit (forkIO-style)
Programmer writes concurrency constructs No Yes (forkIO, MVar, async)
Optimal batching / overlap Framework decides Programmer decides
Reasoning about races Pure-functional, no races possible Hard; requires discipline
Best when… The task is data-flow (fetch, join, fetch) The task is control-flow (pipelines, servers)

Implicit concurrency is specifically for data-flow-shaped workloads. A rule engine's "evaluate this policy" is data-flow- shaped: fetch some facts, combine them, emit a decision. A server's "handle this connection" is control-flow-shaped — explicit concurrency is usually the right tool there.

When this concept fits

  • Data-flow-dominant workloads with multiple backend fetches.
  • Domain experts writing the code who are not concurrency experts (anti-abuse engineers, data scientists, business-rule authors).
  • The framework can recognise data-source identity well enough to batch same-source requests safely.

When it doesn't fit

  • Control-flow-heavy code (server loops, pipelines with custom backpressure) — explicit concurrency is clearer and more correct.
  • Side-effectful sequential code where the "batch these two" decision is not the framework's to make.

Seen in

Last updated · 319 distilled / 1,201 read