δ-Mem: Efficient Online Memory for Large Language Models

May 16, 2026 · ai history · Source ↗

TLDR

Paper proposes δ-mem, a frozen-backbone memory add-on using an 8x8 delta-rule state matrix to boost long-context LLM performance without fine-tuning.

δ-mem adds a fixed-size associative memory state that generates low-rank corrections to attention computation during inference, leaving the backbone frozen.
An 8x8 state matrix yields 1.10x average improvement over the base model and 1.15x over the strongest competing memory baseline.
Memory-heavy benchmarks see larger gains: 1.31x on MemoryAgentBench, 1.20x on LoCoMo.
No full fine-tuning, backbone replacement, or explicit context window extension required; general capabilities are largely preserved.
Targets long-term assistants and agent systems where repeated context injection is costly.

Commenters are skeptical about practical capacity limits: a fixed-size state compresses history but still faces retrieval degradation when input variations produce divergent activations.
No compute cost, RAM footprint, or inference latency numbers are reported in the paper, which commenters flagged as a significant omission for real deployment decisions.
The novelty is seen as modest – applying DeltaNet hypernetworks to existing LLMs – with interest conditional on real-world agent benchmarks beyond academic memory tasks.

@djoldman: argues memory size in bytes, time-to-first-token, and throughput should be standard reported metrics alongside parameter count.
@in-silico: notes a 300M-parameter Hebbian state (comparable to Llama 3 8B KV cache at 10K context) could theoretically store substantial information, pushing back on capacity pessimism.