Entropy Coding (rANS)
Compress quantized indices below their nominal bit-width by exploiting non-uniform Gaussian bin probabilities.
Where It Fits: The Index Term
In the quantization formulation, the dominant term in the storage budget is the index tensor at bits per weight:
Entropy coding targets the first term. Because Lloyd-Max quantization of produces non-uniform bin probabilities (inner levels are more probable than outer), the Shannon entropy is strictly less than .
Entropy Gap
| b (bits) | Levels | H (bits/sym) | Saving |
|---|---|---|---|
| 2 | 4 | 1.911 | โ0.089 |
| 3 | 8 | 2.832 | โ0.168 |
| 4 | 16 | 3.764 | โ0.236 |
| 5 | 32 | 4.755 | โ0.245 |
At 4 bits, entropy coding saves ~0.24 BPW โ bringing the index cost from 4.0 to ~3.76 bits per weight.
How rANS Works
Asymmetric Numeral Systems (Duda 2009) achieve near-entropy-optimal compression with a simple, GPU-friendly decode loop. Symbols are split into blocks of for independent parallel decoding.
Encode (sequential per block)
Process symbols in reverse. For symbol with frequency and cumulative :
โ frequencies are quantized to sum to .
Decode (GPU-parallel per block)
Each block starts from a known 4-byte state. Per symbol:
Decode Table Size
The entire decode table fits comfortably in GPU shared memory or registers:
Frequency table
bytes (uint16). At 4-bit: 32 bytes.
Cumulative table
bytes (uint32). At 4-bit: 68 bytes.
Total: ~100 bytes for 4-bit โ derived from the known Gaussian bin probabilities, no training data needed.
Relationship to Other Techniques
Lloyd-Max
Entropy coding exploits the non-uniform bin probabilities from optimal Gaussian quantization. Uniform quantizers would have (no saving).
4-bit Packing
Packing reduces storage by fitting two indices per byte. Entropy coding goes further by exploiting statistical redundancy within those indices.
Residual Quantization
Each residual pass produces its own index tensor โ entropy coding applies independently to each pass.
Norm Compression
Entropy coding compresses the index tensor; norm factorization compresses the norm tensor. Together they address both major storage components.