Norm Compression
Rank-1 SVD factorization with int8 residual reduces norm storage by ~4ร โ targeting the second-largest BPW term.
Where It Fits: The Norm Term
In the quantization formulation, the norm tensor is the second-largest storage component after quantized indices:
At with float32 norms, each pass contributes BPW. With two residual passes, this becomes 0.50 BPW โ a significant fraction of the total budget that norm compression directly reduces.
Rank-1 Factorization
The norm tensor has strong low-rank structure: rows of the same layer tend to have similar magnitude patterns across groups. This motivates a rank-1 SVD approximation with a small int8 correction:
How It Works
SVD of the norm matrix
Compute and take the first singular vector: , .
Compute fractional residual
Quantize residual to int8
Symmetric quantization: , then . Typically of the norm value.
Storage Comparison
| Method | Components | BPW (d=128) |
|---|---|---|
| float32 (baseline) | bits | 0.250 |
| float16 | bits | 0.125 |
| Factored int8 โจ | ~0.063 |
The factored representation achieves ~4ร compression vs float32 norms, saving ~0.19 BPW per pass.
Reconstruction
The reconstruction error is bounded by the int8 quantization granularity: , which is typically less than 0.5% of the norm value.
Relationship to Other Techniques
Row Normalization (Step 1)
The norm tensor is produced during the normalization step of the quantization pipeline. This codec compresses that output.
Residual Quantization
Each residual pass produces its own norm tensor. Factorization is especially beneficial here since residual norms are highly structured.
Entropy Coding
Entropy coding compresses the index tensor; norm factorization compresses the norm tensor. Together they address both major storage components.
BPW Budget
At , switching from float32 to factored int8 saves ~0.19 BPW per pass โ directly reducing the norm overhead in the formulation.