๐Ÿ”„

Random Rotation for Decorrelation

Multiplying by a random orthogonal matrix spreads information uniformly across all coordinates โ€” making each one approximately Gaussian.

The Problem with Correlated Weights

Neural network weight matrices have structured correlations โ€” some coordinates carry much more information than others. Scalar quantization applied directly to correlated weights wastes bits on low-variance coordinates and under-quantizes high-variance ones.

Solution: Random Orthogonal Rotation

Norm-preserving

โ€–Yโ€–โ‚‚ = โ€–Wโ€–โ‚‚ = 1 โ€” orthogonal matrices preserve norms

Decorrelating

Coordinates of Y become approximately independent

Gaussianizing

By CLT on high-dimensional unit vectors, each coordinate โ‰ˆ ๐’ฉ(0, 1/d)

Invertible

W = Y ยท ฮ  โ€” the inverse is just the transpose

Interactive: Before & After Rotation

Watch how rotation transforms a structured weight distribution into a uniform Gaussian-like distribution where every coordinate has equal variance.

โ†’

Haar-Distributed Random Orthogonal Matrix (QR Method)

The "gold standard" for random rotations. Drawn from the Haar measure on O(d) โ€” the unique distribution invariant under left/right multiplication by any orthogonal matrix.

1Draw A โˆˆ โ„^{dร—d} with i.i.d. ๐’ฉ(0,1) entries
2Compute QR decomposition: A = QR
3Adjust signs: ฮ  = Q ยท diag(sign(diag(R)))
Trade-off: O(dยฒ) storage and compute. For d=128 (default group size), the rotation matrix is 128ร—128ร—4 bytes = 64 KB โ€” manageable.

Implementation

rotation.py โ†’ generate_rotation_matrix()