MultiHop-RAG

Name: MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Creator: HKUST
Keywords: multi-hop-qa, rag-retrieval, knowledge-graph

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Benchmark Metadata

PublisherHKUST

VenueCOLM 2024

Evaluation Typeautomatic

Dimensions4

Test Prompts2,556

ScoringHigher is better

Update Frequencyannual

PaperView Paper

LeaderboardView Leaderboard

What It Measures

Inference, comparison, and temporal multi-hop queries
Retrieval recall@k for evidence chunks
Final answer accuracy

What It Does Not Measure

Personalization
Cross-session memory
Long-context window stress

All Systems Evaluated(149 systems)

5 self-reported144 estimated

Rank	System	Score	Provenance	Source
#1	QdrantQdrant	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#2	AllegroGraphFranz Inc.	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#3	AppAgentTencent / mnotgod96	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#4	Athina AIAthina AI (YC W23)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#5	AutoWebGLMTHUDM	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#6	Backboard IOBackboard.io	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#7	BotpressBotpress Inc.	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#8	BrowserGymServiceNow Research	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#9	D-MemYou et al. (2025)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#10	DB-GPTeosphoros-ai	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#11	DiffbotDiffbot Inc.	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#12	DifyLangGenius	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#13	GraphRAG-SDKFalkorDB	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#14	HiMemZhu et al. (JD.com, 2026)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#15	HippoRAGOSU NLP Group (Ohio State University)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#16	HippoRAG 2OSU NLP Group	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#17	HuggingGPT / JARVISMicrosoft Research	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#18	KnowAgentzjunlp (Zhejiang University)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#19	MCP Memory ServerAnthropic / Model Context Protocol	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#20	MemR32025 (December submission)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#21	MIRIXMIRIX AI (Wang, Chen)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#22	MultiOnMultiOn (now AGI Inc.)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#23	Nano GraphRAGgusye1234	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#24	Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#25	PathRAGBUPT-GAMMA	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#26	RMMGoogle / UCSB (2025)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#27	StardogStardog Union Inc.	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#28	SuperAGITransformerOptimus	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#29	TrustRAGGoMate Community	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#30	Vellum AIVellum AI Inc. (YC W23)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#31	xmemoryxmemory Inc.	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#32	CradleBAAI-Agents	74.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#33	HebbiaHebbia, Inc.	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#34	R2RSciPhi-AI	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#35	LangGraphLangChain	74.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#36	Onyxonyx-dot-app	74.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#37	VoiceflowVoiceflow Inc.	74.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#38	AutoGen Core MemoryMicrosoft	74.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#39	KAGOpenSPG / Ant Group	74.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#40	LagentInternLM (Shanghai AI Lab)	74.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#41	CognigyCognigy GmbH (acquired by NICE, July 2025)	74.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#42	Neo4j AuraDBNeo4j Inc.	74.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#43	Neo4j LLM Graph BuilderNeo4j Labs	74.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#44	WebVoyagerMinorJerry et al.	74.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#45	AutoGen StudioMicrosoft Research	74.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#46	VectorShiftVectorShift Inc. (YC S23)	74.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#47	GraphRAGMicrosoft	74	Estimated	Arena estimate — derived from capability profile, not independently verified
#48	LangSmith LangGraph CloudLangChain Inc.	73.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#49	Self-RAGUniversity of Washington / Allen AI (Asai et al.)	73.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#50	AutoGPT PlatformSignificant Gravitas	73.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#51	Galileo AIGalileo Technologies Inc.	73.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#52	RAGFlowInfiniFlow	73.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#53	MemoryScopeAlibaba ModelScope	73.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#54	CrewAICrewAI Inc. (Joao Moura)	73.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#55	Swarmskyegomez / Swarms Corp	73.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#56	Maxim AIMaxim AI Inc.	73.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#57	Open InterpreterOpenInterpreter	72.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#58	MilvusZilliz	72.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#59	AgentVerseOpenBMB (Tsinghua)	72.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#60	Dust ttDust (formerly XP1)	72.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#61	Lindy AILindy AI	72.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#62	ChatDBTsinghua University (Hu et al.)	72.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#63	AriGraphAIRI Institute / Moscow	72.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#64	Mixedbread AIMixedbread AI	72.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#65	PaperQA2FutureHouse	72.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#66	Bishengdataelement	72.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#67	SID AISID (YC)	72.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#68	SupermemorySupermemory	72.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#69	MoTFudan (Li, Qiu)	72.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#70	Haystack Memorydeepset	72.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#71	LangflowLangflow-ai (DataStax)	72	Estimated	Arena estimate — derived from capability profile, not independently verified
#72	NemoriNemori AI (independent)	72	Estimated	Arena estimate — derived from capability profile, not independently verified
#73	Qwen-AgentQwenLM (Alibaba)	72	Estimated	Arena estimate — derived from capability profile, not independently verified
#74	SynapseNTU / Salesforce (Zheng et al.)	72	Estimated	Arena estimate — derived from capability profile, not independently verified
#75	PineconePinecone Systems	71.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#76	LightRAGHKUDS (HKU Data Intelligence Lab)	71.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#77	HoneyHiveHoneyHive Inc.	71.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#78	FlowiseFlowiseAI	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#79	HybridAGISynaLinks	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#80	LarimarIBM Research	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#81	MarkerDatalab (datalab-to)	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#82	CAMELCAMEL-AI.org	71.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#83	GleanGlean Technologies	71.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#84	LangMemLangChain	71	Estimated	Arena estimate — derived from capability profile, not independently verified
#85	Cohere EmbedCohere Inc.	70.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#86	Kore AIKore.ai Inc.	70.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#87	WeaviateWeaviate	70.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#88	ChatDev 2.0OpenBMB	70.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#89	AGiXTJosh-XT	70.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#90	Think-in-MemoryAnt Group / Alibaba (Liu et al.)	70.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#91	Memoripycaspianmoon	70.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#92	SynapseNanyang Technological University (Zheng et al.)	70.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#93	LlamaIndex MemoryLlamaIndex	70.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#94	MetaGPTDeepWisdom / geekan	70.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#95	Stack AIStack AI Inc. (YC W23)	70.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#96	GPTeam101dotxyz	69.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#97	Nomic AtlasNomic AI Inc.	69.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#98	Mobile-AgentAlibaba Tongyi Lab (X-PLUG)	69.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#99	ChromaChroma	69.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#100	FastGPTlabring	69.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#101	SelfmemTsinghua / Microsoft (Cheng et al.)	69	Estimated	Arena estimate — derived from capability profile, not independently verified
#102	Voyage AIVoyage AI (acquired by MongoDB, Feb 2025)	69	Estimated	Arena estimate — derived from capability profile, not independently verified
#103	MiniRAGHKUDS	68.4	Self-Reported	arXiv:2501.06713 Table 1 — Accuracy with gpt-4o-mini; SLM backbones: Phi-3.5-mini 49.96, GLM-Edge 51.41, Qwen2.5-3B 48.55
#104	MemoChatUniversity of Warwick / Alibaba	67.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#105	MemoriGibsonAI	67.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#106	GraphitiZep AI	66.4	Self-Reported	Zep / Graphiti paper
#107	Mnemosyneindependent	66.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#108	ParadeDBParadeDB Inc. (YC S23)	65.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#109	REALMGoogle Research (Guu et al.)	65.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#110	Couchbase VectorCouchbase Inc.	65.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#111	txtaiNeuML	65.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#112	Elasticsearch VectorElastic N.V.	65.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#113	PrivateGPTZylon AI	65.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#114	Mem AIMem Labs	64.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#115	VectaraVectara Inc.	64.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#116	ZepZep AI	64.8	Self-Reported	Zep evals
#117	Astra DBDataStax	64.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#118	Saner AISaner.AI	64.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#119	MarqoMarqo Pty Ltd	64.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#120	MyScaleMyScale Inc.	64.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#121	ColPaliilluin-tech	63.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#122	RETRODeepMind (Borgeaud et al.)	63.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#123	MemoryBankHarbin Institute of Technology / SenseTime	63.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#124	OpenSearch VectorOpenSearch Project (AWS-led)	63.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#125	Neon VectorNeon Inc.	63.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#126	Redis VectorRedis Ltd.	63	Estimated	Arena estimate — derived from capability profile, not independently verified
#127	AtlasMeta AI FAIR (Izacard et al.)	62.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#128	Carbon AICarbon (acquired by Perplexity, Dec 2024)	62.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#129	CognitaTrueFoundry	62.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#130	LlamaCloudLlamaIndex Inc.	62.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#131	Unstructured IOUnstructured Technologies Inc.	62.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#132	MendableMendable (YC-backed)	62.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#133	Manticore SearchManticore Software Ltd.	62.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#134	VerbaWeaviate	62.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#135	EpsillaEpsilla Inc. (YC S23)	61.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#136	RagieRagie Inc.	61.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#137	pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc.	61.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#138	TurboPufferTurboPuffer Inc.	60.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#139	Supabase VectorSupabase Inc.	60.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#140	Vespa AIYahoo / Vespa.ai (independent OSS project)	60.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#141	QuivrQuivrHQ	60.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#142	SingleStore VectorSingleStore Inc.	60.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#143	AnythingLLMMintplex Labs	60.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#144	MongoDB Atlas VectorMongoDB Inc.	60.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#145	MemoroMIT Media Lab	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#146	ValdYahoo Japan	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#147	vectorizeVectorize Inc.	59.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#148	CogneeCognee	57.2	Self-Reported	Cognee benchmark
#149	MemaryKingjulio8238	52.7	Self-Reported	Memary repo evals