SYSTEM Cited by 1 source

unweight-kernels¶

Definition¶

unweight-kernels (github.com/cloudflareresearch/unweight-kernels) is the open-source GPU kernel suite backing Unweight, Cloudflare's lossless LLM weight compression system. Ships the custom Hopper-WGMMA reconstructive matmul (plus Huffman decode and palette transcode preprocess kernels) that load compressed weights from HBM, reconstruct BF16 in shared memory, and feed tensor cores directly. (Source: sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality)

Why open-sourced¶

Cloudflare frames the release as contribution to "a growing corpus of research in compression and GPU efficiency"; published alongside a technical paper. Same upstream-the-fix posture as the 2025-10 V8 / OpenNext / Node.js four-PR instance: ecosystem contribution is the default even when direct Cloudflare benefit is hard to measure.

Scope¶

Hopper (sm_90, H100) target at launch; uses wgmma + TMA + SMEM primitives directly.
Implements the full reconstructive matmul (patterns/fused-decompress-tensor-core-matmul) with the producer/consumer thread-group split and circular-buffer depth variants the Unweight runtime autotunes between.
Huffman decode kernel + palette transcode kernel ship in the repo for the preprocessing pipelines.

Seen in¶

sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality — announcement; published alongside the technical paper.

systems/unweight — the runtime these kernels belong to.
systems/nvidia-tensor-core — hardware target.
patterns/fused-decompress-tensor-core-matmul, patterns/sm-partitioning-producer-consumer, patterns/upstream-the-fix.

unweight-kernels¶

Definition¶

Why open-sourced¶

Scope¶

Seen in¶

Related¶