Quantized Johnson-Lindenstrauss (QJL)
A 1-bit random projection technique for unbiased inner product estimation โ elegant for KV-cache attention, but not the right tool for offline weight compression.
The Johnson-Lindenstrauss Lemma
The JL lemma (1984) states that any set of points in high-dimensional space can be embedded into dimensions while preserving all pairwise distances within a factor of .
The projection is a random linear map โ a matrix with i.i.d. Gaussian or sub-Gaussian entries, scaled appropriately. This is the theoretical foundation behind QJL.
How QJL Works
QJL (Zandieh et al., 2024) takes the JL idea further: instead of storing the full projected coordinates, it keeps only the sign โ just 1 bit per projection. Given random directions , the inner product estimator is:
Key Properties
QJL in the TurboQuant Paper
The paper defines TurboQuantprod, which combines standard TurboQuant with a QJL correction for an unbiased inner product estimator:
This makes the overall estimator unbiased โ critical for KV-cache attention where you quantize keys once and query with many different vectors over the sequence lifetime.
Why This Project Doesn't Use QJL
QJL is designed for a fundamentally different use case. Here are the four reasons we chose multi-pass residual quantization instead.
Different Problem: Online vs Offline
Unbiasedness Is Unnecessary for Weights
Residual Quantization Strictly Dominates
QJL uses 1 bit (random sign projection) for the residual correction. Our residual pass uses bits with a full Lloyd-Max codebook + independent rotation โ capturing far more residual information.
At 4+4 total bits, residual TurboQuant achieves KL divergence of only 0.002 nats (practically lossless). A 1-bit QJL correction cannot compete.
QJL Requires the Query at Runtime
Visual Comparison
TurboQuantprod (Paper)
This Project (Residual TQ)
Summary
QJL is an elegant technique rooted in the JL lemma โ perfect for streaming / KV-cache inner product preservation with 1-bit signed projections. For offline weight compression, multi-pass residual quantization with optimal scalar codebooks is the natural and superior choice โ achieving practically lossless results at 4+4 bits with no runtime overhead.
References
QJL: Zandieh et al., "QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead," 2024.
Johnson-Lindenstrauss: W. Johnson & J. Lindenstrauss, "Extensions of Lipschitz mappings into a Hilbert space," Contemporary Mathematics, 1984.
TurboQuant: Zandieh et al., "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate," arXiv:2504.19874, 2025.