SYSTEM Cited by 1 source
Sigma (Meta anti-abuse rule engine)¶
What it is¶
Sigma is Meta's in-path rule engine for proactively identifying malicious actions on Facebook — spam, phishing, posting links to malware, and similar abuse — before the action takes effect. For every user interaction (status update, like, click, Messenger send, etc.) Sigma evaluates a set of policies specific to that interaction type, and "bad content detected by Sigma is removed automatically so that it doesn't show up in your News Feed" (Source: sources/2015-06-26-meta-fighting-spam-with-haskell).
Sigma is in the request path — it must respond fast enough that the user's action is not perceptibly delayed. Post-rewrite throughput disclosed: more than one million requests per second.
Architecture (post-2015 rewrite)¶
Sigma is Haskell sandwiched between two layers of C++ — see patterns/embedded-functional-runtime-in-cpp-service:
- C++ Thrift server on top. Chosen for maturity and performance over Haskell-native thrift servers.
- Haskell middle, running the policies. Uses Haxl for implicit concurrent data fetching; runs on GHC with Meta-contributed extensions (Applicative do-notation; per-thread allocation limits; GC changes for safe hot-code-swap unload).
- Existing C++ service-client libraries below. Wrapped as Haxl data sources via Haskell's FFI; not rewritten. A compile-time C++ name-demangler avoids intermediate C shims for most calls.
Operational posture¶
- "Source code in the repository is the code running in Sigma." Policies are continuously deployed — minutes from commit to fleet. See patterns/rule-engine-with-continuous-policy-deploy.
- Type-correct or rejected at repo ingress: "we don't allow code to be checked into the repository unless it is type-correct" — pure-functional + strong-typing discipline as a first-line safety gate.
- Hot-code swapping of compiled policy code on a running process. New requests serve on new code; in-flight requests finish on the old code; GHC's garbage collector detects when the old code is no longer referenced and triggers safe unload (concepts/hot-code-swapping).
- Persistent state's code is never changed during hot-swap — state-layer invariants hold.
Predecessor¶
- FXL — an in-house Facebook DSL retired from Sigma. Interpreted (therefore slow), lacked user-defined types and modules, forced perf-critical logic into C++ in Sigma itself (slowing policy roll-out). Canonical cautionary tale: complexity growth outran the DSL's expressivity budget; interpreter performance capped hardware utilisation.
Measured performance (vs FXL, at rewrite completion)¶
- Haskell: up to 3× faster on individual request types.
- Haskell: 20–30% overall throughput improvement on a typical workload mix — "we can serve 20 percent to 30 percent more traffic with the same hardware".
- Measurement basis: the 25 most common request types, accounting for ≈95% of typical Sigma workload.
- Enabled by: per-request automatic memoization of top-level
computations (source-to-source translation), GHC heap-management
changes reducing GC frequency on multicore (Meta runs ≥ 64 MB
allocation area per core), selective FFI marshaling, and bug fixes
including a long-latency GHC GC crash and an
aesonJSON-parsing bug whose "one-in-a-million corner cases ... tend to crop up all the time" at Facebook scale.
Reference¶
The architecture + migration is described in Simon Marlow's 2015 post Fighting spam with Haskell. The Haxl framework that drives Sigma's data-fetching concurrency model is open-source at facebook/Haxl and documented in the ICFP 2014 paper There is no fork: an abstraction for efficient, concurrent, and concise data access.
Seen in¶
- sources/2015-06-26-meta-fighting-spam-with-haskell — canonical Sigma architecture + FXL → Haskell migration post.
Related¶
- systems/haxl — Sigma's concurrency + batching substrate.
- systems/haskell / systems/ghc — language + runtime.
- systems/fxl-meta — predecessor.
- patterns/rule-engine-with-continuous-policy-deploy — Sigma's operational posture.
- patterns/embedded-functional-runtime-in-cpp-service — Sigma's C++ / Haskell / C++ sandwich integration pattern.
- concepts/hot-code-swapping — live policy reload primitive.
- concepts/implicit-concurrent-data-fetching — Haxl's model.
- concepts/allocation-limit — per-request resource isolation.
- concepts/purely-functional-policy-language — policy-author isolation + engine-safety properties.
- companies/meta