δ-Mem: Efficient Online Memory for Large Language Models

· ai history · Source ↗

TLDR

  • Paper proposes δ-mem, a frozen-backbone memory add-on using an 8x8 delta-rule state matrix to boost long-context LLM performance without fine-tuning.

Key Takeaways

  • δ-mem adds a fixed-size associative memory state that generates low-rank corrections to attention computation during inference, leaving the backbone frozen.
  • An 8x8 state matrix yields 1.10x average improvement over the base model and 1.15x over the strongest competing memory baseline.
  • Memory-heavy benchmarks see larger gains: 1.31x on MemoryAgentBench, 1.20x on LoCoMo.
  • No full fine-tuning, backbone replacement, or explicit context window extension required; general capabilities are largely preserved.
  • Targets long-term assistants and agent systems where repeated context injection is costly.

Hacker News Comment Review

  • Commenters are skeptical about practical capacity limits: a fixed-size state compresses history but still faces retrieval degradation when input variations produce divergent activations.
  • No compute cost, RAM footprint, or inference latency numbers are reported in the paper, which commenters flagged as a significant omission for real deployment decisions.
  • The novelty is seen as modest – applying DeltaNet hypernetworks to existing LLMs – with interest conditional on real-world agent benchmarks beyond academic memory tasks.

Notable Comments

  • @djoldman: argues memory size in bytes, time-to-first-token, and throughput should be standard reported metrics alongside parameter count.
  • @in-silico: notes a 300M-parameter Hebbian state (comparable to Llama 3 8B KV cache at 10K context) could theoretically store substantial information, pushing back on capacity pessimism.

Original | Discuss on HN