LongBench

Name: LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Creator: Tsinghua KEG
Keywords: long-context-retrieval, long-context

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Benchmark Metadata

PublisherTsinghua KEG

VenueACL 2024

Evaluation Typeautomatic

Dimensions21

Test Prompts4,750

ScoringHigher is better

Update Frequencyannual

PaperView Paper

LeaderboardView Leaderboard

What It Measures

Single- and multi-document QA
Summarization
Few-shot in-context learning
Synthetic retrieval
Code completion over long contexts

What It Does Not Measure

Cross-session memory
Personalization
Latency

All Systems Evaluated(121 systems)

4 self-reported117 estimated

Rank	System	Score	Provenance	Source
#1	txtaiNeuML	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#2	LlamaIndex MemoryLlamaIndex	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#3	Haystack Memorydeepset	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#4	Claude ProjectsAnthropic	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#5	PineconePinecone Systems	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#6	WeaviateWeaviate	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#7	QdrantQdrant	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#8	ChromaChroma	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#9	MilvusZilliz	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#10	Activeloop Deep LakeActiveloop Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#11	Adept AIAdept AI Labs (acquired by Amazon 2024)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#12	AgentScopeModelScope (Alibaba)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#13	AllegroGraphFranz Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#14	AnythingLLMMintplex Labs	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#15	Astra DBDataStax	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#16	Athina AIAthina AI (YC W23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#17	AtlasMeta AI FAIR (Izacard et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#18	Bishengdataelement	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#19	Carbon AICarbon (acquired by Perplexity, Dec 2024)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#20	CognitaTrueFoundry	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#21	Cohere EmbedCohere Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#22	ColPaliilluin-tech	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#23	Compressive TransformerDeepMind (Rae et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#24	Couchbase VectorCouchbase Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#25	DiffbotDiffbot Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#26	DifyLangGenius	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#27	Dust ttDust (formerly XP1)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#28	Elasticsearch VectorElastic N.V.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#29	EpsillaEpsilla Inc. (YC S23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#30	FastGPTlabring	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#31	FlowiseFlowiseAI	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#32	Galileo AIGalileo Technologies Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#33	Granola AIGranola	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#34	GraphRAGMicrosoft	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#35	GraphRAG-SDKFalkorDB	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#36	H2OUT Austin / Rice / CMU / Stanford / Meta (Zhang et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#37	HebbiaHebbia, Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#38	HoneyHiveHoneyHive Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#39	ICAEMicrosoft Research (Ge et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#40	∞ FormerInstituto de Telecomunicações / DeepMind / IST (Martins, Marinho, Martins)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#41	Jina AI EmbeddingsJina AI GmbH	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#42	KAGOpenSPG / Ant Group	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#43	KDB AIKX Systems	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#44	kNN-LMStanford / Facebook AI Research (Khandelwal et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#45	LanceDBLanceDB Inc. (YC S22)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#46	Landmark AttentionEPFL (Mohtashami, Jaggi)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#47	LangflowLangflow-ai (DataStax)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#48	LangSmith LangGraph CloudLangChain Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#49	LightRAGHKUDS (HKU Data Intelligence Lab)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#50	LlamaCloudLlamaIndex Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#51	LM-InfiniteIllinois / Meta (Han et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#52	LongMemUCSB / Microsoft Research	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#53	MambaCMU / Princeton (Gu, Dao)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#54	Manticore SearchManticore Software Ltd.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#55	MarkerDatalab (datalab-to)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#56	MarqoMarqo Pty Ltd	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#57	Maxim AIMaxim AI Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#58	Mem AIMem Labs	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#59	MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#60	Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#61	Memory³Institute for Advanced Algorithms Research Shanghai / Peking University	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#62	MemoryLLMUCSD / Apple (Wang et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#63	MemR32025 (December submission)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#64	MendableMendable (YC-backed)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#65	MiniRAGHKUDS	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#66	Mixedbread AIMixedbread AI	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#67	MongoDB Atlas VectorMongoDB Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#68	MyScaleMyScale Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#69	Nano GraphRAGgusye1234	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#70	Neo4j LLM Graph BuilderNeo4j Labs	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#71	Neon VectorNeon Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#72	Nomic AtlasNomic AI Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#73	Notion AINotion Labs	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#74	Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#75	Onyxonyx-dot-app	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#76	OpenSearch VectorOpenSearch Project (AWS-led)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#77	PaperQA2FutureHouse	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#78	ParadeDBParadeDB Inc. (YC S23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#79	PathRAGBUPT-GAMMA	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#80	pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#81	PrivateGPTZylon AI	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#82	QuivrQuivrHQ	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#83	Qwen-AgentQwenLM (Alibaba)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#84	R2RSciPhi-AI	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#85	R3MemHKUST (2025)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#86	RAGFlowInfiniFlow	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#87	RagieRagie Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#88	RAPTORStanford (Sarthi, Abdullah et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#89	REALMGoogle Research (Guu et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#90	Recurrent Memory TransformerMIPT / DeepPavlov (Bulatov, Kuratov, Burtsev)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#91	Redis VectorRedis Ltd.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#92	ReMeModelScope (Alibaba)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#93	RETRODeepMind (Borgeaud et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#94	RWKVRWKV Foundation / BlinkDL community	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#95	Sana AISana Labs	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#96	Saner AISaner.AI	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#97	ScissorhandsRice / Stanford / Meta (Liu et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#98	Self-RAGUniversity of Washington / Allen AI (Asai et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#99	SelfmemTsinghua / Microsoft (Cheng et al.)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#100	SID AISID (YC)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#101	SingleStore VectorSingleStore Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#102	Stack AIStack AI Inc. (YC W23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#103	StardogStardog Union Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#104	Supabase VectorSupabase Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#105	Titanslucidrains (community) / paper by Google Research	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#106	TRIMEPrinceton NLP (Zhong, Lei, Chen)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#107	TrustRAGGoMate Community	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#108	TurboPufferTurboPuffer Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#109	Unstructured IOUnstructured Technologies Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#110	ValdYahoo Japan	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#111	VectaraVectara Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#112	vectorizeVectorize Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#113	VectorShiftVectorShift Inc. (YC S23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#114	Vellum AIVellum AI Inc. (YC W23)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#115	VerbaWeaviate	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#116	Vespa AIYahoo / Vespa.ai (independent OSS project)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#117	Voyage AIVoyage AI (acquired by MongoDB, Feb 2025)	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#118	EM-LLMem-llm (academic consortium)	51.3	Self-Reported	arXiv:2407.09450 Table 1 — EM-LLM (SM) on LLaMA 3.1-8B; avg of SQA 41.2 MQA 41.3 Sum 29.2 FSL 69.1 Ret 98.5 Code 64.1
#119	MemoRAGBAAI / Qhjqhj00	44.4	Self-Reported	arXiv:2409.05591 Table 1 — Mistral-7B-v0.2-32K memory + Phi-3-mini-128K generator; avg of NarrativeQA 27.5, Qasper 43.9, MultiFieldQA 52.2, MuSiQue 33.9, 2WikiMQA 54.1, HotpotQA 54.8
#120	Activation BeaconBAAI / Renmin University (Zhang et al.)	39.8	Self-Reported	arXiv:2401.03462 Table 3 — On Llama-2-7B-chat; avg of SQA 27.14, MQA 28.28, Sum 25.15, FSL 60.72, Code 57.83
#121	StreamingLLMMIT Han Lab / Meta AI (Xiao et al.)	24.5	Self-Reported	arXiv:2309.17453 Table 8 — StreamingLLM 1750+1750 on Llama2-7B-chat; avg of NarrativeQA 18.2, Qasper 19.7, HotpotQA 24.9, 2WikiMQA 32.0, GovReport 26.3, MultiNews 25.9