๐Ÿ“ฆ

4-bit Index Packing

Two 4-bit indices packed per uint8 byte โ€” halving storage with near-free bitwise operations.

Byte Layout

Two 4-bit quantization indices are packed into each uint8 byte. The low nibble (bits 3-0) holds index[k] and the high nibble (bits 7-4) holds index[k+1].

Interactive: Pack & Unpack

Watch two 4-bit indices merge into a single byte, then split back apart:

idx[k] = 5
0
1
0
1
idx[k+1] = 11
1
0
1
1

Unpack Operations

Low nibble

High nibble

Storage Savings

An index matrix becomes uint8. When is odd, the last column is zero-padded before packing and the original is stored in metadata for correct unpacking.

Implementation

quantize.py โ†’ pack_4bit(), unpack_4bit()