Learning Pseudorandom Numbers with Transformers

· ai security · Source ↗

TLDR

  • Paper shows Transformers can predict PCG pseudorandom sequences in-context, beating published classical attacks, with context length scaling as sqrt(modulus).

Key Takeaways

  • Transformers (up to 50M params, 5B tokens) successfully predict unseen PCG sequences including variants with bit-shifts, XORs, rotations, and truncations.
  • Even single-bit truncated output remains reliably predictable, a result the authors flag as surprising.
  • Context length required for near-perfect prediction scales as sqrt(m) where m is the modulus, a concrete scaling law.
  • Moduli >= 2^20 require curriculum learning from smaller moduli; without it, optimization stalls in extended stagnation phases.
  • Embedding layers spontaneously form bitwise rotationally-invariant clusters in top PCA components, explaining cross-moduli representation transfer.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN