Neural networks and symmetric ciphers independently converged on the same architecture: alternating linear/nonlinear layers, parallel chunk processing, and row/column mixing.
Key Takeaways
RNNs and SHA-3’s Sponge construction share identical sequential state-absorption structure; Transformers and fast MACs share identical parallel chunk-then-add structure with position encodings.
The core primitive in both fields is: linear mix, nonlinear transform, repeat – enabling analysis and hardware optimization of one layer type instead of many.
AES alternates ShiftRows/MixColumns; Transformers alternate attention (row mixing) and feed-forward (column mixing) – factored mixing is asymptotically faster and cache-friendlier than full-matrix mixing.
Three shared constraints drive convergence: weak correctness requirements (invertibility vs. differentiability), quality defined as thorough mixing and complexity, and extreme hardware performance pressure.
RevNets already imported Feistel networks from crypto into neural nets for reversible memory-saving layers; the author proposes exploring Column Parity Mixers and unaligned mixers as next candidates.