Recurrent Memory Transformer

by MIPT / DeepPavlov (Bulatov, Kuratov, Burtsev)

System Card

OrganizationMIPT / DeepPavlov (Bulatov, Kuratov, Burtsev)

Released2022-07

Architectureexternal-memory-network / Memory tokens passed between segments recurrently

DetailsAdds special memory tokens to each segment that pass information recurrently across segments of a long sequence, with no architectural changes beyond token-level memory slots.

Parameters—

Domainlong-context

Open SourceYes

PaperView Paper

CodeRepository

neurips-2022aaai-2024recurrentmemory-tokens1m-tokens

Capability Profile

Benchmark Scores

6 of 14 benchmarks

Long-Context Retrieval

5/5

79.594p

75.962p

7977p

603p

77.841p

Multi-Turn Recall

0/2

LoCoMo

no data

MemoryBank

no data

Cross-Session Memory

0/1

LongMemEval

no data

Multi-Hop QA

1/3

BABILong

80.390p

MultiHop-RAG

no data

HotpotQA

no data

Agent Task Memory

0/1

AgentBench-Mem

no data

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

0/1

RAGAS

no data

Sources:Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on BABILong: Testing the Limits of LLMs with Long-Context Reasoning-in-a-Haystack (AIRI, 2406)Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens (Tsinghua / OpenBMB, 2402)Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on LooGLE: Can Long-Context Language Models Understand Long Contexts? (Peking University, 2311)Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on Needle in a Haystack (Greg Kamradt, 2024)Recurrent Memory Transformer paper (arXiv:2207.06881); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)