๐Ÿ“

Quantized Johnson-Lindenstrauss (QJL)

A 1-bit random projection technique for unbiased inner product estimation โ€” elegant for KV-cache attention, but not the right tool for offline weight compression.

The Johnson-Lindenstrauss Lemma

The JL lemma (1984) states that any set of points in high-dimensional space can be embedded into dimensions while preserving all pairwise distances within a factor of .

The projection is a random linear map โ€” a matrix with i.i.d. Gaussian or sub-Gaussian entries, scaled appropriately. This is the theoretical foundation behind QJL.

How QJL Works

QJL (Zandieh et al., 2024) takes the JL idea further: instead of storing the full projected coordinates, it keeps only the sign โ€” just 1 bit per projection. Given random directions , the inner product estimator is:

Key Properties

๐ŸŽฏ
Unbiased
๐Ÿ’พ
1 bit per projection
Store only sign(โŸจri, vโŸฉ)
โšก
Zero decode overhead
Sign comparisons via bitwise XOR + popcount

QJL in the TurboQuant Paper

The paper defines TurboQuantprod, which combines standard TurboQuant with a QJL correction for an unbiased inner product estimator:

1Quantize the vector using TurboQuant (rotation + Lloyd-Max) โ†’
2Compute residual
3Apply 1-bit QJL to for an unbiased correction:

This makes the overall estimator unbiased โ€” critical for KV-cache attention where you quantize keys once and query with many different vectors over the sequence lifetime.

Why This Project Doesn't Use QJL

QJL is designed for a fundamentally different use case. Here are the four reasons we chose multi-pass residual quantization instead.

1

Different Problem: Online vs Offline

QJL is designed for online inner product estimation โ€” quantize once, query many times with different vectors. Weight quantization is offline: we compress once and compute repeatedly. We want minimum reconstruction error , not an unbiased dot-product estimator.
2

Unbiasedness Is Unnecessary for Weights

A small deterministic bias from MSE-optimal quantization is absorbed by layer norms, residual connections, and softmax normalization. An unbiased but high-variance estimator (QJL at 1 bit) introduces stochastic noise that changes every forward pass โ€” worse for stable inference.
3

Residual Quantization Strictly Dominates

QJL uses 1 bit (random sign projection) for the residual correction. Our residual pass uses bits with a full Lloyd-Max codebook + independent rotation โ€” capturing far more residual information.

QJL correction
1 bit per weight
Random sign only
Residual TQ
4 bits per weight
Full Lloyd-Max codebook

At 4+4 total bits, residual TurboQuant achieves KL divergence of only 0.002 nats (practically lossless). A 1-bit QJL correction cannot compete.

4

QJL Requires the Query at Runtime

The QJL correction term depends on the input activation , making it incompatible with offline weight compression. You'd need to recompute corrections per forward pass โ€” defeating the purpose of weight-only quantization.

Visual Comparison

TurboQuantprod (Paper)

Pass 1: Lloyd-Max quantize (bโ‚ bits)
+
Pass 2: QJL 1-bit sign projection on residual
โ†“
Unbiased inner product estimator. Needs query x at runtime.

This Project (Residual TQ)

Pass 1: Full TQ: rotate + Lloyd-Max (4 bits)
+
Pass 2: Full TQ on residual (4 bits, new codebook)
โ†“
Near-lossless weight compression. Offline, no runtime dependency.

Summary

QJL is an elegant technique rooted in the JL lemma โ€” perfect for streaming / KV-cache inner product preservation with 1-bit signed projections. For offline weight compression, multi-pass residual quantization with optimal scalar codebooks is the natural and superior choice โ€” achieving practically lossless results at 4+4 bits with no runtime overhead.

References

QJL: Zandieh et al., "QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead," 2024.

Johnson-Lindenstrauss: W. Johnson & J. Lindenstrauss, "Extensions of Lipschitz mappings into a Hilbert space," Contemporary Mathematics, 1984.

TurboQuant: Zandieh et al., "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate," arXiv:2504.19874, 2025.