Landmark Attention

by EPFL (Mohtashami, Jaggi)

System Card

OrganizationEPFL (Mohtashami, Jaggi)

Released2023-05

Architecturekv-cache-extension / Block-level landmark tokens with direct attention retrieval

DetailsInserts landmark tokens representing each input block, and trains attention to use them for selecting relevant blocks. Retrieval flows through the model's own attention mechanism, preserving random access to the full context.

Parameters—

Domainlong-context

Open SourceYes

PaperView Paper

CodeRepository

neurips-2023random-accessblockretrieval-by-attention

Capability Profile

Benchmark Scores

6 of 14 benchmarks

Long-Context Retrieval

5/5

75.685p

77.585p

80.486p

603p

79.563p

Multi-Turn Recall

0/2

LoCoMo

no data

MemoryBank

no data

Cross-Session Memory

0/1

LongMemEval

no data

Multi-Hop QA

1/3

BABILong

73.230p

MultiHop-RAG

no data

HotpotQA

no data

Agent Task Memory

0/1

AgentBench-Mem

no data

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

0/1

RAGAS

no data

Sources:Landmark Attention paper (arXiv:2305.16300); evaluated on BABILong: Testing the Limits of LLMs with Long-Context Reasoning-in-a-Haystack (AIRI, 2406)Landmark Attention paper (arXiv:2305.16300); evaluated on InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens (Tsinghua / OpenBMB, 2402)Landmark Attention paper (arXiv:2305.16300); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)Landmark Attention paper (arXiv:2305.16300); evaluated on LooGLE: Can Long-Context Language Models Understand Long Contexts? (Peking University, 2311)Landmark Attention paper (arXiv:2305.16300); evaluated on Needle in a Haystack (Greg Kamradt, 2024)Landmark Attention paper (arXiv:2305.16300); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)