Jane Street ML researcher uses one-parameter group theory to prove only a few positional encoding families are valid, and all sensible ones are already in use.
Key Takeaways
Formalizing linearity, translation invariance, and continuity constraints forces any positional encoding into a one-parameter matrix group of the form exp(tM).
Diagonalizable generators yield three cases: NoPE (zero eigenvalue), exponential decay (negative real eigenvalue, used in linear attention/RetNet/Mamba-3), and RoPE-style rotation (complex eigenvalue).
RoPE is recovered directly from the 2D complex-eigenvalue subspace; exponentially damped RoPE generalizes it and appears in RetNet and Mamba-3.
Defective (non-diagonalizable, Jordan block) generators are technically legal and produce polynomial-in-time attention modulation, but appear unexplored and unlikely to be practical.
The analysis covers continuous and irregularly sampled time, making it applicable beyond integer sequence indices to time-series and Mamba-style learned-time models.