RETRO

by DeepMind (Borgeaud et al.)

System Card

OrganizationDeepMind (Borgeaud et al.)

Released2021-12

Architecturevector-rag / Chunked cross-attention over 2T-token BERT-indexed datastore

DetailsConditions an autoregressive LM on document chunks retrieved by local similarity to preceding tokens. Uses a frozen BERT retriever, differentiable encoder, and chunked cross-attention to attend over 2T tokens.

Parameters—

Domainrag-retrieval

Open SourceNo

PaperView Paper

WebsiteVisit

CodeRepository

icml-20222-trillion-tokenscross-attentionretrofit

Capability Profile

Benchmark Scores

5 of 14 benchmarks

Long-Context Retrieval

2/5

RULER

68.930p

NIAH

no data

LooGLE

no data

LongBench

603p

∞Bench

no data

Multi-Turn Recall

0/2

LoCoMo

no data

MemoryBank

no data

Cross-Session Memory

0/1

LongMemEval

no data

Multi-Hop QA

2/3

BABILong

no data

MultiHop-RAG

63.818p

HotpotQA

62.924p

Agent Task Memory

0/1

AgentBench-Mem

no data

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

1/1

RAGAS

65.728p

Sources:RETRO paper (arXiv:2112.04426); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)RETRO paper (arXiv:2112.04426); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)RETRO paper (arXiv:2112.04426); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)RETRO paper (arXiv:2112.04426); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)RETRO paper (arXiv:2112.04426); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)