Skip to content

CONCEPT Cited by 1 source

Dewey Decimal Classification

Dewey Decimal Classification (DDC) is the universal library classification system that organises knowledge into ten primary classes (000–900) with hierarchical decimal subdivisions. In the sysdesign-wiki corpus it shows up as the categorical taxonomy an LLM-driven document-classification pipeline emits per page/document.

Stub page. Single-source instance so far. Included on the wiki because DDC is a real architectural choice (a standard public taxonomy with existing tooling, search vocabulary, and cross-archive interoperability), not because the wiki cares about library science.

Why DDC, architecturally

When you build a document-classification pipeline against a domain archive, you have to pick the output vocabulary:

  1. Custom domain taxonomy. Define your own categories. Highest precision; zero portability; new partners must learn your tags.
  2. Domain-specific standard taxonomy (e.g. for hydrogeology: USGS classifications). Mid-precision; portable within the field; not recognised outside it.
  3. Universal taxonomy (Dewey Decimal, Library of Congress, MeSH). Lower-precision-per-document; maximally portable; instantly queryable by any researcher who knows the system.

The MapAid groundwater pipeline picks (3) — Dewey Decimal codes — because the archive serves humanitarian researchers, university partners, and government agencies across multiple disciplines, and DDC is the existing lingua franca for cross-discipline document discovery.

"The model examines each page image and returns: Dewey Decimal classification codes, the universal library classification system…" (Source: sources/2026-05-11-databricks-unlocking-the-archives)

The architectural lesson generalises: when the consumers of your classification span organizations, prefer a public standard taxonomy over a custom one. The cost is per-document precision; the gain is zero-friction discoverability.

Seen in

Last updated · 542 distilled / 1,571 read