Skip to content

SYSTEM Cited by 1 source

Sigma (Meta anti-abuse rule engine)

What it is

Sigma is Meta's in-path rule engine for proactively identifying malicious actions on Facebook — spam, phishing, posting links to malware, and similar abuse — before the action takes effect. For every user interaction (status update, like, click, Messenger send, etc.) Sigma evaluates a set of policies specific to that interaction type, and "bad content detected by Sigma is removed automatically so that it doesn't show up in your News Feed" (Source: sources/2015-06-26-meta-fighting-spam-with-haskell).

Sigma is in the request path — it must respond fast enough that the user's action is not perceptibly delayed. Post-rewrite throughput disclosed: more than one million requests per second.

Architecture (post-2015 rewrite)

Sigma is Haskell sandwiched between two layers of C++ — see patterns/embedded-functional-runtime-in-cpp-service:

  1. C++ Thrift server on top. Chosen for maturity and performance over Haskell-native thrift servers.
  2. Haskell middle, running the policies. Uses Haxl for implicit concurrent data fetching; runs on GHC with Meta-contributed extensions (Applicative do-notation; per-thread allocation limits; GC changes for safe hot-code-swap unload).
  3. Existing C++ service-client libraries below. Wrapped as Haxl data sources via Haskell's FFI; not rewritten. A compile-time C++ name-demangler avoids intermediate C shims for most calls.

Operational posture

  • "Source code in the repository is the code running in Sigma." Policies are continuously deployed — minutes from commit to fleet. See patterns/rule-engine-with-continuous-policy-deploy.
  • Type-correct or rejected at repo ingress: "we don't allow code to be checked into the repository unless it is type-correct"pure-functional + strong-typing discipline as a first-line safety gate.
  • Hot-code swapping of compiled policy code on a running process. New requests serve on new code; in-flight requests finish on the old code; GHC's garbage collector detects when the old code is no longer referenced and triggers safe unload (concepts/hot-code-swapping).
  • Persistent state's code is never changed during hot-swap — state-layer invariants hold.

Predecessor

  • FXL — an in-house Facebook DSL retired from Sigma. Interpreted (therefore slow), lacked user-defined types and modules, forced perf-critical logic into C++ in Sigma itself (slowing policy roll-out). Canonical cautionary tale: complexity growth outran the DSL's expressivity budget; interpreter performance capped hardware utilisation.

Measured performance (vs FXL, at rewrite completion)

  • Haskell: up to 3× faster on individual request types.
  • Haskell: 20–30% overall throughput improvement on a typical workload mix — "we can serve 20 percent to 30 percent more traffic with the same hardware".
  • Measurement basis: the 25 most common request types, accounting for ≈95% of typical Sigma workload.
  • Enabled by: per-request automatic memoization of top-level computations (source-to-source translation), GHC heap-management changes reducing GC frequency on multicore (Meta runs ≥ 64 MB allocation area per core), selective FFI marshaling, and bug fixes including a long-latency GHC GC crash and an aeson JSON-parsing bug whose "one-in-a-million corner cases ... tend to crop up all the time" at Facebook scale.

Reference

The architecture + migration is described in Simon Marlow's 2015 post Fighting spam with Haskell. The Haxl framework that drives Sigma's data-fetching concurrency model is open-source at facebook/Haxl and documented in the ICFP 2014 paper There is no fork: an abstraction for efficient, concurrent, and concise data access.

Seen in

Last updated · 319 distilled / 1,201 read