Using group theory to explore the space of positional encodings for attention

May 2, 2026 · ai · Source ↗

TLDR

Jane Street ML researcher uses one-parameter group theory to prove only a few positional encoding families are valid, and all sensible ones are already in use.

Formalizing linearity, translation invariance, and continuity constraints forces any positional encoding into a one-parameter matrix group of the form exp(tM).
Diagonalizable generators yield three cases: NoPE (zero eigenvalue), exponential decay (negative real eigenvalue, used in linear attention/RetNet/Mamba-3), and RoPE-style rotation (complex eigenvalue).
RoPE is recovered directly from the 2D complex-eigenvalue subspace; exponentially damped RoPE generalizes it and appears in RetNet and Mamba-3.
Defective (non-diagonalizable, Jordan block) generators are technically legal and produce polynomial-in-time attention modulation, but appear unexplored and unlikely to be practical.
The analysis covers continuous and irregularly sampled time, making it applicable beyond integer sequence indices to time-series and Mamba-style learned-time models.