Back to Arena
RETRO
by DeepMind (Borgeaud et al.)
System Card
OrganizationDeepMind (Borgeaud et al.)
Released2021-12
Architecturevector-rag / Chunked cross-attention over 2T-token BERT-indexed datastore
DetailsConditions an autoregressive LM on document chunks retrieved by local similarity to preceding tokens. Uses a frozen BERT retriever, differentiable encoder, and chunked cross-attention to attend over 2T tokens.
Parameters—
Domainrag-retrieval
Open SourceNo
PaperView Paper
WebsiteVisit
CodeRepository
icml-20222-trillion-tokenscross-attentionretrofit
Capability Profile
Benchmark Scores
5 of 14 benchmarksMulti-Turn Recall0/2
LoCoMo
no dataMemoryBank
no dataCross-Session Memory0/1
LongMemEval
no dataMulti-Hop QA2/3
Agent Task Memory0/1
AgentBench-Mem
no dataPersonalization0/1
PerLTQA
no dataFactuality / Grounding1/1
Sources:RETRO paper (arXiv:2112.04426); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)RETRO paper (arXiv:2112.04426); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)RETRO paper (arXiv:2112.04426); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)RETRO paper (arXiv:2112.04426); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)RETRO paper (arXiv:2112.04426); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)