Back to Arena

StreamingLLM

by MIT Han Lab / Meta AI (Xiao et al.)

System Card

OrganizationMIT Han Lab / Meta AI (Xiao et al.)
Released2023-09
Architecturekv-cache-extension / Attention sinks + sliding window KV cache
DetailsDiscovers the "attention sink" phenomenon: retaining initial tokens' KVs recovers window-attention performance. Combines sink tokens with a sliding window for stable infinite-length streaming at no training cost.
Parameters
Domainlong-context
Open SourceYes
WebsiteVisit
iclr-2024attention-sinkstreaminginfinite

Capability Profile

Benchmark Scores

7 of 14 benchmarks
Long-Context Retrieval
5/5
RULER
57.20p
NIAH
600p
LooGLE
66.99p
LongBench
24.50p
∞Bench
70.16p
Multi-Turn Recall
0/2
LoCoMo
no data
MemoryBank
no data
Cross-Session Memory
0/1
LongMemEval
no data
Multi-Hop QA
2/3
BABILong
59.31p
MultiHop-RAG
no data
HotpotQA
24.90p
Agent Task Memory
0/1
AgentBench-Mem
no data
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
0/1
RAGAS
no data