๐Ÿ—œ๏ธ

Entropy Coding (rANS)

Compress quantized indices below their nominal bit-width by exploiting non-uniform Gaussian bin probabilities.

Where It Fits: The Index Term

In the quantization formulation, the dominant term in the storage budget is the index tensor at bits per weight:

Entropy coding targets the first term. Because Lloyd-Max quantization of produces non-uniform bin probabilities (inner levels are more probable than outer), the Shannon entropy is strictly less than .

Entropy Gap

b (bits)LevelsH (bits/sym)Saving
241.911โˆ’0.089
382.832โˆ’0.168
4163.764โˆ’0.236
5324.755โˆ’0.245

At 4 bits, entropy coding saves ~0.24 BPW โ€” bringing the index cost from 4.0 to ~3.76 bits per weight.

How rANS Works

Asymmetric Numeral Systems (Duda 2009) achieve near-entropy-optimal compression with a simple, GPU-friendly decode loop. Symbols are split into blocks of for independent parallel decoding.

Encode (sequential per block)

Process symbols in reverse. For symbol with frequency and cumulative :

โ€” frequencies are quantized to sum to .

Decode (GPU-parallel per block)

Each block starts from a known 4-byte state. Per symbol:

1.slot = state & (2P โˆ’ 1)
2.symbol = LUT[slot] // O(1) table lookup
3.state = fs ร— (state โ‰ซ P) + slot โˆ’ cs
4.renormalize: read bytes while state < 216

Decode Table Size

The entire decode table fits comfortably in GPU shared memory or registers:

Frequency table

bytes (uint16). At 4-bit: 32 bytes.

Cumulative table

bytes (uint32). At 4-bit: 68 bytes.

Total: ~100 bytes for 4-bit โ€” derived from the known Gaussian bin probabilities, no training data needed.

Relationship to Other Techniques

๐Ÿ“Š

Lloyd-Max

Entropy coding exploits the non-uniform bin probabilities from optimal Gaussian quantization. Uniform quantizers would have (no saving).

๐Ÿ“ฆ

4-bit Packing

Packing reduces storage by fitting two indices per byte. Entropy coding goes further by exploiting statistical redundancy within those indices.

๐ŸŽฏ

Residual Quantization

Each residual pass produces its own index tensor โ€” entropy coding applies independently to each pass.

๐Ÿ“

Norm Compression

Entropy coding compresses the index tensor; norm factorization compresses the norm tensor. Together they address both major storage components.

Implementation

entropy_codec.py โ†’ gaussian_bin_probs() (compute bin probabilities from Lloyd-Max)
entropy_codec.py โ†’ compute_entropy() (theoretical entropy lower bound)
entropy_codec.py โ†’ build_ans_table() (frequency + cumulative tables)
entropy_codec.py โ†’ rANSCodec.encode() / decode() (block-parallel rANS)