Random Rotation for Decorrelation
Multiplying by a random orthogonal matrix spreads information uniformly across all coordinates โ making each one approximately Gaussian.
The Problem with Correlated Weights
Neural network weight matrices have structured correlations โ some coordinates carry much more information than others. Scalar quantization applied directly to correlated weights wastes bits on low-variance coordinates and under-quantizes high-variance ones.
Solution: Random Orthogonal Rotation
Norm-preserving
โYโโ = โWโโ = 1 โ orthogonal matrices preserve norms
Decorrelating
Coordinates of Y become approximately independent
Gaussianizing
By CLT on high-dimensional unit vectors, each coordinate โ ๐ฉ(0, 1/d)
Invertible
W = Y ยท ฮ โ the inverse is just the transpose
Interactive: Before & After Rotation
Watch how rotation transforms a structured weight distribution into a uniform Gaussian-like distribution where every coordinate has equal variance.
Haar-Distributed Random Orthogonal Matrix (QR Method)
The "gold standard" for random rotations. Drawn from the Haar measure on O(d) โ the unique distribution invariant under left/right multiplication by any orthogonal matrix.