Back to Benchmarks
∞Bench
InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens
Benchmark Metadata
PublisherTsinghua / OpenBMB
VenueACL 2024
Evaluation Typeautomatic
Dimensions12
Test Prompts3,946
ScoringHigher is better
Update Frequencyannual
PaperView Paper
LeaderboardView Leaderboard
What It Measures
- Retrieval at 100k+ tokens
- Math and code over long contexts
- Novel and dialogue QA
- Key-value retrieval
- Summarization over book-length input
What It Does Not Measure
- Multi-session memory
- Personalization
- Real-time latency
All Systems Evaluated(32 systems)
| Rank | System | Score |
|---|---|---|
| #1 | EM-LLMem-llm (academic consortium) | 96.7 |
| #2 | Titanslucidrains (community) / paper by Google Research | 87.4 |
| #3 | LM-InfiniteIllinois / Meta (Han et al.) | 85 |
| #4 | MambaCMU / Princeton (Gu, Dao) | 85 |
| #5 | ScissorhandsRice / Stanford / Meta (Liu et al.) | 84.5 |
| #6 | Jina AI EmbeddingsJina AI GmbH | 83.1 |
| #7 | Compressive TransformerDeepMind (Rae et al.) | 82.8 |
| #8 | R3MemHKUST (2025) | 81.9 |
| #9 | Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy) | 80.3 |
| #10 | MemoryLLMUCSD / Apple (Wang et al.) | 80 |
| #11 | Landmark AttentionEPFL (Mohtashami, Jaggi) | 79.5 |
| #12 | TRIMEPrinceton NLP (Zhong, Lei, Chen) | 79.5 |
| #13 | GAMVectorSpaceLab (BAAI-related) | 79 |
| #14 | RAPTORStanford (Sarthi, Abdullah et al.) | 79 |
| #15 | LongMemUCSB / Microsoft Research | 78.7 |
| #16 | MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.) | 78.7 |
| #17 | ∞ FormerInstituto de Telecomunicações / DeepMind / IST (Martins, Marinho, Martins) | 78.2 |
| #18 | ICAEMicrosoft Research (Ge et al.) | 77.8 |
| #19 | Recurrent Memory TransformerMIPT / DeepPavlov (Bulatov, Kuratov, Burtsev) | 77.8 |
| #20 | Activation BeaconBAAI / Renmin University (Zhang et al.) | 77.5 |
| #21 | HEMAindependent (Ahn et al.) | 77.2 |
| #22 | Memory³Institute for Advanced Algorithms Research Shanghai / Peking University | 77.2 |
| #23 | H2OUT Austin / Rice / CMU / Stanford / Meta (Zhang et al.) | 76.2 |
| #24 | SCMBeihang / NLPR (Wang et al.) | 76.2 |
| #25 | ReMeModelScope (Alibaba) | 75 |
| #26 | Adept AIAdept AI Labs (acquired by Amazon 2024) | 73.5 |
| #27 | RWKVRWKV Foundation / BlinkDL community | 72.6 |
| #28 | AgentScopeModelScope (Alibaba) | 70.3 |
| #29 | LanceDBLanceDB Inc. (YC S22) | 70.3 |
| #30 | StreamingLLMMIT Han Lab / Meta AI (Xiao et al.) | 70.1 |
| #31 | Activeloop Deep LakeActiveloop Inc. | 69.3 |
| #32 | MemoRAGBAAI / Qhjqhj00 | 24.5 |