๐ฆ
4-bit Index Packing
Two 4-bit indices packed per uint8 byte โ halving storage with near-free bitwise operations.
Byte Layout
Two 4-bit quantization indices are packed into each uint8 byte. The low nibble (bits 3-0) holds index[k] and the high nibble (bits 7-4) holds index[k+1].
Interactive: Pack & Unpack
Watch two 4-bit indices merge into a single byte, then split back apart:
idx[k] = 5
0
1
0
1
idx[k+1] = 11
1
0
1
1
Unpack Operations
Low nibble
High nibble
Storage Savings
An index matrix becomes uint8. When is odd, the last column is zero-padded before packing and the original is stored in metadata for correct unpacking.
Implementation
quantize.py โ pack_4bit(), unpack_4bit()