REALM

by Google Research (Guu et al.)

System Card

OrganizationGoogle Research (Guu et al.)

Released2020-02

Architecturevector-rag / Latent retriever pretrained with MLM backprop

DetailsAugments LM pretraining with a latent knowledge retriever that attends over millions of documents. Retriever is trained unsupervised by masked-LM loss backpropagating through the retrieval step.

Parameters—

Domainrag-retrievalknowledge-graph

Open SourceYes

PaperView Paper

CodeRepository

icml-2020pretraininglatent-retrieverwikipedia

Capability Profile

Benchmark Scores

6 of 14 benchmarks

Long-Context Retrieval

2/5

RULER

68.625p

NIAH

no data

LooGLE

no data

LongBench

603p

∞Bench

no data

Multi-Turn Recall

0/2

LoCoMo

no data

MemoryBank

no data

Cross-Session Memory

1/1

LongMemEval

59.19p

Multi-Hop QA

2/3

BABILong

no data

MultiHop-RAG

65.627p

HotpotQA

64.328p

Agent Task Memory

0/1

AgentBench-Mem

no data

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

1/1

RAGAS

63.10p

Sources:REALM paper (arXiv:2002.08909); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)REALM paper (arXiv:2002.08909); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)REALM paper (arXiv:2002.08909); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)REALM paper (arXiv:2002.08909); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)REALM paper (arXiv:2002.08909); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)REALM paper (arXiv:2002.08909); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)