🔄

Random Rotation for Decorrelation

Multiplying by a random orthogonal matrix spreads information uniformly across all coordinates — making each one approximately Gaussian.

The Problem with Correlated Weights

Neural network weight matrices have structured correlations — some coordinates carry much more information than others. Scalar quantization applied directly to correlated weights wastes bits on low-variance coordinates and under-quantizes high-variance ones.

Solution: Random Orthogonal Rotation

Norm-preserving

‖Y‖₂ = ‖W‖₂ = 1 — orthogonal matrices preserve norms

Decorrelating

Coordinates of Y become approximately independent

Gaussianizing

By CLT on high-dimensional unit vectors, each coordinate ≈ 𝒩(0, 1/d)

Invertible

W = Y · Π — the inverse is just the transpose

Interactive: Before & After Rotation

Watch how rotation transforms a structured weight distribution into a uniform Gaussian-like distribution where every coordinate has equal variance.

→

Haar-Distributed Random Orthogonal Matrix (QR Method)

The "gold standard" for random rotations. Drawn from the Haar measure on O(d) — the unique distribution invariant under left/right multiplication by any orthogonal matrix.

1Draw A ∈ ℝ^{d×d} with i.i.d. 𝒩(0,1) entries

2Compute QR decomposition: A = QR

3Adjust signs: Π = Q · diag(sign(diag(R)))

Trade-off: O(d²) storage and compute. For d=128 (default group size), the rotation matrix is 128×128×4 bytes = 64 KB — manageable.

Implementation

rotation.py → generate_rotation_matrix()

← Lloyd-Max Walsh-Hadamard →