๐Ÿ“

Norm Compression

Rank-1 SVD factorization with int8 residual reduces norm storage by ~4ร— โ€” targeting the second-largest BPW term.

Where It Fits: The Norm Term

In the quantization formulation, the norm tensor is the second-largest storage component after quantized indices:

At with float32 norms, each pass contributes BPW. With two residual passes, this becomes 0.50 BPW โ€” a significant fraction of the total budget that norm compression directly reduces.

Rank-1 Factorization

The norm tensor has strong low-rank structure: rows of the same layer tend to have similar magnitude patterns across groups. This motivates a rank-1 SVD approximation with a small int8 correction:

ฮฒm
Row scale (float16)
Per-row magnitude
ฮณg
Group scale (float16)
Per-group pattern
ฮตm,g
Residual (int8)
Fractional correction

How It Works

1

SVD of the norm matrix

Compute and take the first singular vector: , .

2

Compute fractional residual

3

Quantize residual to int8

Symmetric quantization: , then . Typically of the norm value.

Storage Comparison

MethodComponentsBPW (d=128)
float32 (baseline) bits0.250
float16 bits0.125
Factored int8 โœจ~0.063

The factored representation achieves ~4ร— compression vs float32 norms, saving ~0.19 BPW per pass.

Reconstruction

The reconstruction error is bounded by the int8 quantization granularity: , which is typically less than 0.5% of the norm value.

Relationship to Other Techniques

๐Ÿ”„

Row Normalization (Step 1)

The norm tensor is produced during the normalization step of the quantization pipeline. This codec compresses that output.

๐ŸŽฏ

Residual Quantization

Each residual pass produces its own norm tensor. Factorization is especially beneficial here since residual norms are highly structured.

๐Ÿ—œ๏ธ

Entropy Coding

Entropy coding compresses the index tensor; norm factorization compresses the norm tensor. Together they address both major storage components.

๐Ÿ“

BPW Budget

At , switching from float32 to factored int8 saves ~0.19 BPW per pass โ€” directly reducing the norm overhead in the formulation.

Implementation

norm_compression.py โ†’ factorize_norms() (rank-1 SVD + int8 residual)
norm_compression.py โ†’ reconstruct_norms() (reconstruct from factored form)
norm_compression.py โ†’ norm_bpw() (compute BPW overhead for any method)