Back to Benchmarks

LooGLE

LooGLE: Can Long-Context Language Models Understand Long Contexts?

Benchmark Metadata

PublisherPeking University
VenueACL 2024
Evaluation Typeautomatic
Dimensions7
Test Prompts776
ScoringHigher is better
Update Frequencyannual
LeaderboardView Leaderboard

What It Measures

  • Short and long-dependency QA
  • Summarization over long documents
  • Computation across documents
  • Timeline reorder
  • Multi-information retrieval

What It Does Not Measure

  • Multi-session memory
  • Latency
  • Personalization

All Systems Evaluated(22 systems)

RankSystemScore
#1Titanslucidrains (community) / paper by Google Research81.3
#2MambaCMU / Princeton (Gu, Dao)81.1
#3Landmark AttentionEPFL (Mohtashami, Jaggi)80.4
#4MemoryLLMUCSD / Apple (Wang et al.)79.2
#5Recurrent Memory TransformerMIPT / DeepPavlov (Bulatov, Kuratov, Burtsev)79
#6Jina AI EmbeddingsJina AI GmbH78.9
#7ICAEMicrosoft Research (Ge et al.)78.7
#8∞ FormerInstituto de Telecomunicações / DeepMind / IST (Martins, Marinho, Martins)77.9
#9LM-InfiniteIllinois / Meta (Han et al.)77.7
#10Compressive TransformerDeepMind (Rae et al.)77.6
#11Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy)77.5
#12RAPTORStanford (Sarthi, Abdullah et al.)77.1
#13H2OUT Austin / Rice / CMU / Stanford / Meta (Zhang et al.)75.9
#14TRIMEPrinceton NLP (Zhong, Lei, Chen)75.4
#15Memory³Institute for Advanced Algorithms Research Shanghai / Peking University74.5
#16RWKVRWKV Foundation / BlinkDL community73.9
#17LongMemUCSB / Microsoft Research73.1
#18ScissorhandsRice / Stanford / Meta (Liu et al.)72.9
#19Activation BeaconBAAI / Renmin University (Zhang et al.)68.6
#20StreamingLLMMIT Han Lab / Meta AI (Xiao et al.)66.9
#21Activeloop Deep LakeActiveloop Inc.61.3
#22LanceDBLanceDB Inc. (YC S22)59.9