SYSTEM Cited by 1 source
unweight-kernels¶
Definition¶
unweight-kernels (github.com/cloudflareresearch/unweight-kernels) is the open-source GPU kernel suite backing Unweight, Cloudflare's lossless LLM weight compression system. Ships the custom Hopper-WGMMA reconstructive matmul (plus Huffman decode and palette transcode preprocess kernels) that load compressed weights from HBM, reconstruct BF16 in shared memory, and feed tensor cores directly. (Source: sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality)
Why open-sourced¶
Cloudflare frames the release as contribution to "a growing corpus of research in compression and GPU efficiency"; published alongside a technical paper. Same upstream-the-fix posture as the 2025-10 V8 / OpenNext / Node.js four-PR instance: ecosystem contribution is the default even when direct Cloudflare benefit is hard to measure.
Scope¶
- Hopper (
sm_90, H100) target at launch; useswgmma+ TMA + SMEM primitives directly. - Implements the full reconstructive matmul (patterns/fused-decompress-tensor-core-matmul) with the producer/consumer thread-group split and circular-buffer depth variants the Unweight runtime autotunes between.
- Huffman decode kernel + palette transcode kernel ship in the repo for the preprocessing pipelines.
Seen in¶
- sources/2026-04-17-cloudflare-unweight-how-we-compressed-an-llm-22-percent-without-sacrificing-quality โ announcement; published alongside the technical paper.
Related¶
- systems/unweight โ the runtime these kernels belong to.
- systems/nvidia-tensor-core โ hardware target.
- patterns/fused-decompress-tensor-core-matmul, patterns/sm-partitioning-producer-consumer, patterns/upstream-the-fix.