Back to Benchmarks

HotpotQA

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Benchmark Metadata

PublisherStanford / CMU
VenueEMNLP 2018
Evaluation Typeautomatic
Dimensions2
Test Prompts7,405
ScoringHigher is better
Update Frequencyannual
LeaderboardView Leaderboard

What It Measures

  • Multi-hop answer exact match and F1
  • Supporting-fact prediction F1

What It Does Not Measure

  • Cross-session memory
  • Personalization
  • Long context retrieval

All Systems Evaluated(188 systems)

RankSystemScore
#1Backboard IOBackboard.io82
#2MemR32025 (December submission)82
#3xmemoryxmemory Inc.82
#4VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.)81.5
#5HuggingGPT / JARVISMicrosoft Research80.3
#6MCP Memory ServerAnthropic / Model Context Protocol80.2
#7ReflexionNortheastern / MIT / Princeton (Shinn et al.)80
#8MIRIXMIRIX AI (Wang, Chen)79.9
#9Generative AgentsStanford University / Google Research79.7
#10AutoGPT PlatformSignificant Gravitas79.5
#11D-MemYou et al. (2025)79.4
#12Onyxonyx-dot-app79
#13Lyzr CognisLyzr AI78.6
#14SupermemorySupermemory78.6
#15SCMBeihang / NLPR (Wang et al.)78.4
#16HiMemZhu et al. (JD.com, 2026)78.2
#17DifyLangGenius78
#18ChatDBTsinghua University (Hu et al.)77.9
#19NemoriNemori AI (independent)77.8
#20Self-RAGUniversity of Washington / Allen AI (Asai et al.)77.8
#21Athina AIAthina AI (YC W23)77.7
#22WebVoyagerMinorJerry et al.77.7
#23AgentVerseOpenBMB (Tsinghua)77.6
#24CradleBAAI-Agents77.5
#25CrewAICrewAI Inc. (Joao Moura)77.5
#26Mobile-AgentAlibaba Tongyi Lab (X-PLUG)77.4
#27RecallMCisco Research / independent (Kynoch & Latapie)77.4
#28SuperAGITransformerOptimus77.3
#29SynapseNanyang Technological University (Zheng et al.)77.2
#30SID AISID (YC)77.1
#31A-MEMAGI Research / Rutgers77
#32DiffbotDiffbot Inc.76.9
#33Neo4j LLM Graph BuilderNeo4j Labs76.8
#34AutoGen StudioMicrosoft Research76.7
#35HybridAGISynaLinks76.7
#36Adept AIAdept AI Labs (acquired by Amazon 2024)76.4
#37AppAgentTencent / mnotgod9676.4
#38AutoGen Core MemoryMicrosoft76.4
#39Galileo AIGalileo Technologies Inc.76.2
#40KAGOpenSPG / Ant Group76.2
#41RAGFlowInfiniFlow76.2
#42ReMeModelScope (Alibaba)76.2
#43MarkerDatalab (datalab-to)76.1
#44BotpressBotpress Inc.76
#45MultiOnMultiOn (now AGI Inc.)75.9
#46CognigyCognigy GmbH (acquired by NICE, July 2025)75.7
#47OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.)75.7
#48CAMELCAMEL-AI.org75.6
#49HebbiaHebbia, Inc.75.6
#50HippoRAG 2OSU NLP Group75.5
#51AgentScopeModelScope (Alibaba)75.3
#52BrowserGymServiceNow Research75.3
#53GleanGlean Technologies75.3
#54TrustRAGGoMate Community75.3
#55AllegroGraphFranz Inc.75.2
#56Vellum AIVellum AI Inc. (YC W23)75.1
#57Bishengdataelement74.9
#58MoTFudan University (Li & Qiu)74.9
#59VoiceflowVoiceflow Inc.74.9
#60ArcMemoUC Berkeley / Stanford (Ho et al.)74.8
#61HoneyHiveHoneyHive Inc.74.8
#62MetaGPTDeepWisdom / geekan74.7
#63AutoWebGLMTHUDM74.4
#64Swarmskyegomez / Swarms Corp74.4
#65FastGPTlabring74.3
#66Maxim AIMaxim AI Inc.74.3
#67Stack AIStack AI Inc. (YC W23)74
#68RMMGoogle / UCSB (2025)73.9
#69Nano GraphRAGgusye123473.7
#70Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig)73.6
#71MiniRAGHKUDS73.4
#72Kore AIKore.ai Inc.73
#73CrewAI EnterpriseCrewAI Inc.72.9
#74PaperQA2FutureHouse72.9
#75LagentInternLM (Shanghai AI Lab)72.8
#76LangSmith LangGraph CloudLangChain Inc.72.8
#77Lindy AILindy AI72.8
#78Open InterpreterOpenInterpreter72.7
#79BabyAGIYohei Nakajima72.6
#80DB-GPTeosphoros-ai72.6
#81Dust ttDust (formerly XP1)72.5
#82AGiXTJosh-XT72.4
#83MempZhejiang University (Fang et al.)72.4
#84VectorShiftVectorShift Inc. (YC S23)72.4
#85FlowiseFlowiseAI72.3
#86Generative AgentsStanford / Google72
#87LangflowLangflow-ai (DataStax)71.9
#88LightRAGHKUDS (HKU Data Intelligence Lab)71.9
#89Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025)71.9
#90ChatDev 2.0OpenBMB71.8
#91MemOSMemTensor (Li, Zhang, et al.)71.8
#92LangGraphLangChain71.6
#93GraphRAGMicrosoft71.5
#94MoTFudan (Li, Qiu)71.3
#95MemoryScopeAlibaba ModelScope71.2
#96Qwen-AgentQwenLM (Alibaba)70.8
#97JARVIS-1CraftJarvis70.6
#98Claude ProjectsAnthropic70.4
#99StardogStardog Union Inc.70.4
#100Neo4j AuraDBNeo4j Inc.70.2
#101GAMVectorSpaceLab (BAAI-related)70.1
#102Memoripycaspianmoon70.1
#103Think-in-MemoryAnt Group / Alibaba (Liu et al.)70.1
#104Cohere EmbedCohere Inc.70
#105PathRAGBUPT-GAMMA70
#106AriGraphAIRI Institute / Moscow69.9
#107LarimarIBM Research69.9
#108GraphRAG-SDKFalkorDB69.5
#109kNN-LMStanford / Facebook AI Research (Khandelwal et al.)69.5
#110R2RSciPhi-AI69.4
#111Voyage AIVoyage AI (acquired by MongoDB, Feb 2025)69.2
#112GPTeam101dotxyz68.9
#113Mixedbread AIMixedbread AI68.4
#114Granola AIGranola68.2
#115MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.)68
#116LangMemLangChain67.8
#117MemoChatUniversity of Warwick / Alibaba67.6
#118LlamaIndex MemoryLlamaIndex67.5
#119SynapseNTU / Salesforce (Zheng et al.)67.4
#120TRIMEPrinceton NLP (Zhong, Lei, Chen)67.4
#121Heyday AIHeyday (shut down 2025)67.2
#122HEMAindependent (Ahn et al.)66.7
#123Astra DBDataStax66.6
#124Sana AISana Labs66.5
#125Jina AI EmbeddingsJina AI GmbH66.1
#126Nomic AtlasNomic AI Inc.66
#127Neon VectorNeon Inc.65.8
#128Notion AINotion Labs65.5
#129EpsillaEpsilla Inc. (YC S23)65.4
#130MemoriGibsonAI65
#131RAPTORStanford (Sarthi, Abdullah et al.)65
#132Mnemosyneindependent64.8
#133QuivrQuivrHQ64.7
#134MendableMendable (YC-backed)64.6
#135pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc.64.3
#136REALMGoogle Research (Guu et al.)64.3
#137Supabase VectorSupabase Inc.64
#138RagieRagie Inc.63.9
#139CognitaTrueFoundry63.6
#140KDB AIKX Systems63.6
#141MongoDB Atlas VectorMongoDB Inc.63.5
#142ColPaliilluin-tech63.3
#143RETRODeepMind (Borgeaud et al.)62.9
#144Elasticsearch VectorElastic N.V.62.8
#145Manticore SearchManticore Software Ltd.62.8
#146MemoryBankHarbin Institute of Technology / SenseTime62.4
#147SelfmemTsinghua / Microsoft (Cheng et al.)62.4
#148Couchbase VectorCouchbase Inc.62.2
#149OpenSearch VectorOpenSearch Project (AWS-led)62.2
#150vectorizeVectorize Inc.62.2
#151GraphitiZep AI62.1
#152Vespa AIYahoo / Vespa.ai (independent OSS project)62.1
#153MyScaleMyScale Inc.62
#154SingleStore VectorSingleStore Inc.61.9
#155MemoroMIT Media Lab61.8
#156LanceDBLanceDB Inc. (YC S22)61.4
#157VerbaWeaviate61.4
#158AnythingLLMMintplex Labs61.1
#159PrivateGPTZylon AI60.8
#160Activeloop Deep LakeActiveloop Inc.60.6
#161LlamaCloudLlamaIndex Inc.60.5
#162MarqoMarqo Pty Ltd60.4
#163Saner AISaner.AI60.4
#164R3MemHKUST (2025)60.3
#165ParadeDBParadeDB Inc. (YC S23)60.2
#166Redis VectorRedis Ltd.60.2
#167Carbon AICarbon (acquired by Perplexity, Dec 2024)60
#168ValdYahoo Japan60
#169Unstructured IOUnstructured Technologies Inc.59.7
#170HippoRAGOSU NLP Group (Ohio State University)59.2
#171Mem AIMem Labs59.1
#172TurboPufferTurboPuffer Inc.59.1
#173VectaraVectara Inc.58.8
#174CogneeCognee58.6
#175Memory³Institute for Advanced Algorithms Research Shanghai / Peking University57.6
#176MemoRAGBAAI / Qhjqhj0054.8
#177MemaryKingjulio823854.2
#178WeaviateWeaviate53.1
#179MilvusZilliz52.6
#180PineconePinecone Systems52.4
#181QdrantQdrant51.8
#182Haystack Memorydeepset51.2
#183AtlasMeta AI FAIR (Izacard et al.)50.6
#184ChromaChroma49.7
#185txtaiNeuML49.5
#186KnowAgentzjunlp (Zhejiang University)48.1
#187ExpeLTsinghua University (Zhao et al.)39
#188StreamingLLMMIT Han Lab / Meta AI (Xiao et al.)24.9