Back to Benchmarks
LongBench
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Benchmark Metadata
PublisherTsinghua KEG
VenueACL 2024
Evaluation Typeautomatic
Dimensions21
Test Prompts4,750
ScoringHigher is better
Update Frequencyannual
PaperView Paper
LeaderboardView Leaderboard
What It Measures
- Single- and multi-document QA
- Summarization
- Few-shot in-context learning
- Synthetic retrieval
- Code completion over long contexts
What It Does Not Measure
- Cross-session memory
- Personalization
- Latency
All Systems Evaluated(121 systems)
4 self-reported117 estimated
| Rank | System | Score |
|---|---|---|
| #1 | txtaiNeuML | 60 |
| #2 | LlamaIndex MemoryLlamaIndex | 60 |
| #3 | Haystack Memorydeepset | 60 |
| #4 | Claude ProjectsAnthropic | 60 |
| #5 | PineconePinecone Systems | 60 |
| #6 | WeaviateWeaviate | 60 |
| #7 | QdrantQdrant | 60 |
| #8 | ChromaChroma | 60 |
| #9 | MilvusZilliz | 60 |
| #10 | Activeloop Deep LakeActiveloop Inc. | 60 |
| #11 | Adept AIAdept AI Labs (acquired by Amazon 2024) | 60 |
| #12 | AgentScopeModelScope (Alibaba) | 60 |
| #13 | AllegroGraphFranz Inc. | 60 |
| #14 | AnythingLLMMintplex Labs | 60 |
| #15 | Astra DBDataStax | 60 |
| #16 | Athina AIAthina AI (YC W23) | 60 |
| #17 | AtlasMeta AI FAIR (Izacard et al.) | 60 |
| #18 | Bishengdataelement | 60 |
| #19 | Carbon AICarbon (acquired by Perplexity, Dec 2024) | 60 |
| #20 | CognitaTrueFoundry | 60 |
| #21 | Cohere EmbedCohere Inc. | 60 |
| #22 | ColPaliilluin-tech | 60 |
| #23 | Compressive TransformerDeepMind (Rae et al.) | 60 |
| #24 | Couchbase VectorCouchbase Inc. | 60 |
| #25 | DiffbotDiffbot Inc. | 60 |
| #26 | DifyLangGenius | 60 |
| #27 | Dust ttDust (formerly XP1) | 60 |
| #28 | Elasticsearch VectorElastic N.V. | 60 |
| #29 | EpsillaEpsilla Inc. (YC S23) | 60 |
| #30 | FastGPTlabring | 60 |
| #31 | FlowiseFlowiseAI | 60 |
| #32 | Galileo AIGalileo Technologies Inc. | 60 |
| #33 | Granola AIGranola | 60 |
| #34 | GraphRAGMicrosoft | 60 |
| #35 | GraphRAG-SDKFalkorDB | 60 |
| #36 | H2OUT Austin / Rice / CMU / Stanford / Meta (Zhang et al.) | 60 |
| #37 | HebbiaHebbia, Inc. | 60 |
| #38 | HoneyHiveHoneyHive Inc. | 60 |
| #39 | ICAEMicrosoft Research (Ge et al.) | 60 |
| #40 | ∞ FormerInstituto de Telecomunicações / DeepMind / IST (Martins, Marinho, Martins) | 60 |
| #41 | Jina AI EmbeddingsJina AI GmbH | 60 |
| #42 | KAGOpenSPG / Ant Group | 60 |
| #43 | KDB AIKX Systems | 60 |
| #44 | kNN-LMStanford / Facebook AI Research (Khandelwal et al.) | 60 |
| #45 | LanceDBLanceDB Inc. (YC S22) | 60 |
| #46 | Landmark AttentionEPFL (Mohtashami, Jaggi) | 60 |
| #47 | LangflowLangflow-ai (DataStax) | 60 |
| #48 | LangSmith LangGraph CloudLangChain Inc. | 60 |
| #49 | LightRAGHKUDS (HKU Data Intelligence Lab) | 60 |
| #50 | LlamaCloudLlamaIndex Inc. | 60 |
| #51 | LM-InfiniteIllinois / Meta (Han et al.) | 60 |
| #52 | LongMemUCSB / Microsoft Research | 60 |
| #53 | MambaCMU / Princeton (Gu, Dao) | 60 |
| #54 | Manticore SearchManticore Software Ltd. | 60 |
| #55 | MarkerDatalab (datalab-to) | 60 |
| #56 | MarqoMarqo Pty Ltd | 60 |
| #57 | Maxim AIMaxim AI Inc. | 60 |
| #58 | Mem AIMem Labs | 60 |
| #59 | MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.) | 60 |
| #60 | Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy) | 60 |
| #61 | Memory³Institute for Advanced Algorithms Research Shanghai / Peking University | 60 |
| #62 | MemoryLLMUCSD / Apple (Wang et al.) | 60 |
| #63 | MemR32025 (December submission) | 60 |
| #64 | MendableMendable (YC-backed) | 60 |
| #65 | MiniRAGHKUDS | 60 |
| #66 | Mixedbread AIMixedbread AI | 60 |
| #67 | MongoDB Atlas VectorMongoDB Inc. | 60 |
| #68 | MyScaleMyScale Inc. | 60 |
| #69 | Nano GraphRAGgusye1234 | 60 |
| #70 | Neo4j LLM Graph BuilderNeo4j Labs | 60 |
| #71 | Neon VectorNeon Inc. | 60 |
| #72 | Nomic AtlasNomic AI Inc. | 60 |
| #73 | Notion AINotion Labs | 60 |
| #74 | Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025) | 60 |
| #75 | Onyxonyx-dot-app | 60 |
| #76 | OpenSearch VectorOpenSearch Project (AWS-led) | 60 |
| #77 | PaperQA2FutureHouse | 60 |
| #78 | ParadeDBParadeDB Inc. (YC S23) | 60 |
| #79 | PathRAGBUPT-GAMMA | 60 |
| #80 | pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc. | 60 |
| #81 | PrivateGPTZylon AI | 60 |
| #82 | QuivrQuivrHQ | 60 |
| #83 | Qwen-AgentQwenLM (Alibaba) | 60 |
| #84 | R2RSciPhi-AI | 60 |
| #85 | R3MemHKUST (2025) | 60 |
| #86 | RAGFlowInfiniFlow | 60 |
| #87 | RagieRagie Inc. | 60 |
| #88 | RAPTORStanford (Sarthi, Abdullah et al.) | 60 |
| #89 | REALMGoogle Research (Guu et al.) | 60 |
| #90 | Recurrent Memory TransformerMIPT / DeepPavlov (Bulatov, Kuratov, Burtsev) | 60 |
| #91 | Redis VectorRedis Ltd. | 60 |
| #92 | ReMeModelScope (Alibaba) | 60 |
| #93 | RETRODeepMind (Borgeaud et al.) | 60 |
| #94 | RWKVRWKV Foundation / BlinkDL community | 60 |
| #95 | Sana AISana Labs | 60 |
| #96 | Saner AISaner.AI | 60 |
| #97 | ScissorhandsRice / Stanford / Meta (Liu et al.) | 60 |
| #98 | Self-RAGUniversity of Washington / Allen AI (Asai et al.) | 60 |
| #99 | SelfmemTsinghua / Microsoft (Cheng et al.) | 60 |
| #100 | SID AISID (YC) | 60 |
| #101 | SingleStore VectorSingleStore Inc. | 60 |
| #102 | Stack AIStack AI Inc. (YC W23) | 60 |
| #103 | StardogStardog Union Inc. | 60 |
| #104 | Supabase VectorSupabase Inc. | 60 |
| #105 | Titanslucidrains (community) / paper by Google Research | 60 |
| #106 | TRIMEPrinceton NLP (Zhong, Lei, Chen) | 60 |
| #107 | TrustRAGGoMate Community | 60 |
| #108 | TurboPufferTurboPuffer Inc. | 60 |
| #109 | Unstructured IOUnstructured Technologies Inc. | 60 |
| #110 | ValdYahoo Japan | 60 |
| #111 | VectaraVectara Inc. | 60 |
| #112 | vectorizeVectorize Inc. | 60 |
| #113 | VectorShiftVectorShift Inc. (YC S23) | 60 |
| #114 | Vellum AIVellum AI Inc. (YC W23) | 60 |
| #115 | VerbaWeaviate | 60 |
| #116 | Vespa AIYahoo / Vespa.ai (independent OSS project) | 60 |
| #117 | Voyage AIVoyage AI (acquired by MongoDB, Feb 2025) | 60 |
| #118 | EM-LLMem-llm (academic consortium) | 51.3 |
| #119 | MemoRAGBAAI / Qhjqhj00 | 44.4 |
| #120 | Activation BeaconBAAI / Renmin University (Zhang et al.) | 39.8 |
| #121 | StreamingLLMMIT Han Lab / Meta AI (Xiao et al.) | 24.5 |