Skip to content

SYSTEM Cited by 1 source

unweight-kernels

Definition

unweight-kernels (github.com/cloudflareresearch/unweight-kernels) is the open-source GPU kernel suite backing Unweight, Cloudflare's lossless LLM weight compression system. Ships the custom Hopper-WGMMA reconstructive matmul (plus Huffman decode and palette transcode preprocess kernels) that load compressed weights from HBM, reconstruct BF16 in shared memory, and feed tensor cores directly. (Source: sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality)

Why open-sourced

Cloudflare frames the release as contribution to "a growing corpus of research in compression and GPU efficiency"; published alongside a technical paper. Same upstream-the-fix posture as the 2025-10 V8 / OpenNext / Node.js four-PR instance: ecosystem contribution is the default even when direct Cloudflare benefit is hard to measure.

Scope

  • Hopper (sm_90, H100) target at launch; uses wgmma + TMA + SMEM primitives directly.
  • Implements the full reconstructive matmul (patterns/fused-decompress-tensor-core-matmul) with the producer/consumer thread-group split and circular-buffer depth variants the Unweight runtime autotunes between.
  • Huffman decode kernel + palette transcode kernel ship in the repo for the preprocessing pipelines.

Seen in

Last updated ยท 200 distilled / 1,178 read