Back to Benchmarks
HotpotQA
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Benchmark Metadata
PublisherStanford / CMU
VenueEMNLP 2018
Evaluation Typeautomatic
Dimensions2
Test Prompts7,405
ScoringHigher is better
Update Frequencyannual
PaperView Paper
LeaderboardView Leaderboard
What It Measures
- Multi-hop answer exact match and F1
- Supporting-fact prediction F1
What It Does Not Measure
- Cross-session memory
- Personalization
- Long context retrieval
All Systems Evaluated(187 systems)
20 self-reported167 estimated
| Rank | System | Score |
|---|---|---|
| #1 | Backboard IOBackboard.io | 82 |
| #2 | MemR32025 (December submission) | 82 |
| #3 | xmemoryxmemory Inc. | 82 |
| #4 | VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.) | 81.5 |
| #5 | HuggingGPT / JARVISMicrosoft Research | 80.3 |
| #6 | MCP Memory ServerAnthropic / Model Context Protocol | 80.2 |
| #7 | ReflexionNortheastern / MIT / Princeton (Shinn et al.) | 80 |
| #8 | MIRIXMIRIX AI (Wang, Chen) | 79.9 |
| #9 | Generative AgentsStanford University / Google Research | 79.7 |
| #10 | AutoGPT PlatformSignificant Gravitas | 79.5 |
| #11 | D-MemYou et al. (2025) | 79.4 |
| #12 | Onyxonyx-dot-app | 79 |
| #13 | SupermemorySupermemory | 78.6 |
| #14 | SCMBeihang / NLPR (Wang et al.) | 78.4 |
| #15 | HiMemZhu et al. (JD.com, 2026) | 78.2 |
| #16 | DifyLangGenius | 78 |
| #17 | ChatDBTsinghua University (Hu et al.) | 77.9 |
| #18 | NemoriNemori AI (independent) | 77.8 |
| #19 | Self-RAGUniversity of Washington / Allen AI (Asai et al.) | 77.8 |
| #20 | Athina AIAthina AI (YC W23) | 77.7 |
| #21 | WebVoyagerMinorJerry et al. | 77.7 |
| #22 | AgentVerseOpenBMB (Tsinghua) | 77.6 |
| #23 | CradleBAAI-Agents | 77.5 |
| #24 | CrewAICrewAI Inc. (Joao Moura) | 77.5 |
| #25 | Mobile-AgentAlibaba Tongyi Lab (X-PLUG) | 77.4 |
| #26 | RecallMCisco Research / independent (Kynoch & Latapie) | 77.4 |
| #27 | SuperAGITransformerOptimus | 77.3 |
| #28 | SynapseNanyang Technological University (Zheng et al.) | 77.2 |
| #29 | SID AISID (YC) | 77.1 |
| #30 | A-MEMAGI Research / Rutgers | 77 |
| #31 | DiffbotDiffbot Inc. | 76.9 |
| #32 | Neo4j LLM Graph BuilderNeo4j Labs | 76.8 |
| #33 | AutoGen StudioMicrosoft Research | 76.7 |
| #34 | HybridAGISynaLinks | 76.7 |
| #35 | Adept AIAdept AI Labs (acquired by Amazon 2024) | 76.4 |
| #36 | AppAgentTencent / mnotgod96 | 76.4 |
| #37 | AutoGen Core MemoryMicrosoft | 76.4 |
| #38 | Galileo AIGalileo Technologies Inc. | 76.2 |
| #39 | KAGOpenSPG / Ant Group | 76.2 |
| #40 | RAGFlowInfiniFlow | 76.2 |
| #41 | ReMeModelScope (Alibaba) | 76.2 |
| #42 | MarkerDatalab (datalab-to) | 76.1 |
| #43 | BotpressBotpress Inc. | 76 |
| #44 | MultiOnMultiOn (now AGI Inc.) | 75.9 |
| #45 | CognigyCognigy GmbH (acquired by NICE, July 2025) | 75.7 |
| #46 | OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.) | 75.7 |
| #47 | CAMELCAMEL-AI.org | 75.6 |
| #48 | HebbiaHebbia, Inc. | 75.6 |
| #49 | HippoRAG 2OSU NLP Group | 75.5 |
| #50 | AgentScopeModelScope (Alibaba) | 75.3 |
| #51 | BrowserGymServiceNow Research | 75.3 |
| #52 | GleanGlean Technologies | 75.3 |
| #53 | TrustRAGGoMate Community | 75.3 |
| #54 | AllegroGraphFranz Inc. | 75.2 |
| #55 | Vellum AIVellum AI Inc. (YC W23) | 75.1 |
| #56 | Bishengdataelement | 74.9 |
| #57 | MoTFudan University (Li & Qiu) | 74.9 |
| #58 | VoiceflowVoiceflow Inc. | 74.9 |
| #59 | ArcMemoUC Berkeley / Stanford (Ho et al.) | 74.8 |
| #60 | HoneyHiveHoneyHive Inc. | 74.8 |
| #61 | MetaGPTDeepWisdom / geekan | 74.7 |
| #62 | AutoWebGLMTHUDM | 74.4 |
| #63 | Swarmskyegomez / Swarms Corp | 74.4 |
| #64 | FastGPTlabring | 74.3 |
| #65 | Maxim AIMaxim AI Inc. | 74.3 |
| #66 | Stack AIStack AI Inc. (YC W23) | 74 |
| #67 | RMMGoogle / UCSB (2025) | 73.9 |
| #68 | Nano GraphRAGgusye1234 | 73.7 |
| #69 | Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig) | 73.6 |
| #70 | MiniRAGHKUDS | 73.4 |
| #71 | Kore AIKore.ai Inc. | 73 |
| #72 | CrewAI EnterpriseCrewAI Inc. | 72.9 |
| #73 | PaperQA2FutureHouse | 72.9 |
| #74 | LagentInternLM (Shanghai AI Lab) | 72.8 |
| #75 | LangSmith LangGraph CloudLangChain Inc. | 72.8 |
| #76 | Lindy AILindy AI | 72.8 |
| #77 | Open InterpreterOpenInterpreter | 72.7 |
| #78 | BabyAGIYohei Nakajima | 72.6 |
| #79 | DB-GPTeosphoros-ai | 72.6 |
| #80 | Dust ttDust (formerly XP1) | 72.5 |
| #81 | AGiXTJosh-XT | 72.4 |
| #82 | MempZhejiang University (Fang et al.) | 72.4 |
| #83 | VectorShiftVectorShift Inc. (YC S23) | 72.4 |
| #84 | FlowiseFlowiseAI | 72.3 |
| #85 | Generative AgentsStanford / Google | 72 |
| #86 | LangflowLangflow-ai (DataStax) | 71.9 |
| #87 | LightRAGHKUDS (HKU Data Intelligence Lab) | 71.9 |
| #88 | Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025) | 71.9 |
| #89 | ChatDev 2.0OpenBMB | 71.8 |
| #90 | MemOSMemTensor (Li, Zhang, et al.) | 71.8 |
| #91 | LangGraphLangChain | 71.6 |
| #92 | CogneeCognee | 71.5 |
| #93 | GraphRAGMicrosoft | 71.5 |
| #94 | MoTFudan (Li, Qiu) | 71.3 |
| #95 | MemoryScopeAlibaba ModelScope | 71.2 |
| #96 | Qwen-AgentQwenLM (Alibaba) | 70.8 |
| #97 | JARVIS-1CraftJarvis | 70.6 |
| #98 | Claude ProjectsAnthropic | 70.4 |
| #99 | StardogStardog Union Inc. | 70.4 |
| #100 | Neo4j AuraDBNeo4j Inc. | 70.2 |
| #101 | GAMVectorSpaceLab (BAAI-related) | 70.1 |
| #102 | Memoripycaspianmoon | 70.1 |
| #103 | Think-in-MemoryAnt Group / Alibaba (Liu et al.) | 70.1 |
| #104 | Cohere EmbedCohere Inc. | 70 |
| #105 | PathRAGBUPT-GAMMA | 70 |
| #106 | AriGraphAIRI Institute / Moscow | 69.9 |
| #107 | LarimarIBM Research | 69.9 |
| #108 | GraphRAG-SDKFalkorDB | 69.5 |
| #109 | kNN-LMStanford / Facebook AI Research (Khandelwal et al.) | 69.5 |
| #110 | R2RSciPhi-AI | 69.4 |
| #111 | Voyage AIVoyage AI (acquired by MongoDB, Feb 2025) | 69.2 |
| #112 | GPTeam101dotxyz | 68.9 |
| #113 | Mixedbread AIMixedbread AI | 68.4 |
| #114 | Granola AIGranola | 68.2 |
| #115 | MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.) | 68 |
| #116 | LangMemLangChain | 67.8 |
| #117 | MemoChatUniversity of Warwick / Alibaba | 67.6 |
| #118 | LlamaIndex MemoryLlamaIndex | 67.5 |
| #119 | SynapseNTU / Salesforce (Zheng et al.) | 67.4 |
| #120 | TRIMEPrinceton NLP (Zhong, Lei, Chen) | 67.4 |
| #121 | Heyday AIHeyday (shut down 2025) | 67.2 |
| #122 | HEMAindependent (Ahn et al.) | 66.7 |
| #123 | Astra DBDataStax | 66.6 |
| #124 | Sana AISana Labs | 66.5 |
| #125 | Jina AI EmbeddingsJina AI GmbH | 66.1 |
| #126 | Nomic AtlasNomic AI Inc. | 66 |
| #127 | Neon VectorNeon Inc. | 65.8 |
| #128 | Notion AINotion Labs | 65.5 |
| #129 | EpsillaEpsilla Inc. (YC S23) | 65.4 |
| #130 | MemoriGibsonAI | 65 |
| #131 | RAPTORStanford (Sarthi, Abdullah et al.) | 65 |
| #132 | Mnemosyneindependent | 64.8 |
| #133 | QuivrQuivrHQ | 64.7 |
| #134 | MendableMendable (YC-backed) | 64.6 |
| #135 | pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc. | 64.3 |
| #136 | REALMGoogle Research (Guu et al.) | 64.3 |
| #137 | Supabase VectorSupabase Inc. | 64 |
| #138 | RagieRagie Inc. | 63.9 |
| #139 | CognitaTrueFoundry | 63.6 |
| #140 | KDB AIKX Systems | 63.6 |
| #141 | MongoDB Atlas VectorMongoDB Inc. | 63.5 |
| #142 | ColPaliilluin-tech | 63.3 |
| #143 | RETRODeepMind (Borgeaud et al.) | 62.9 |
| #144 | Elasticsearch VectorElastic N.V. | 62.8 |
| #145 | Manticore SearchManticore Software Ltd. | 62.8 |
| #146 | MemoryBankHarbin Institute of Technology / SenseTime | 62.4 |
| #147 | SelfmemTsinghua / Microsoft (Cheng et al.) | 62.4 |
| #148 | Couchbase VectorCouchbase Inc. | 62.2 |
| #149 | OpenSearch VectorOpenSearch Project (AWS-led) | 62.2 |
| #150 | vectorizeVectorize Inc. | 62.2 |
| #151 | GraphitiZep AI | 62.1 |
| #152 | Vespa AIYahoo / Vespa.ai (independent OSS project) | 62.1 |
| #153 | MyScaleMyScale Inc. | 62 |
| #154 | SingleStore VectorSingleStore Inc. | 61.9 |
| #155 | MemoroMIT Media Lab | 61.8 |
| #156 | LanceDBLanceDB Inc. (YC S22) | 61.4 |
| #157 | VerbaWeaviate | 61.4 |
| #158 | AnythingLLMMintplex Labs | 61.1 |
| #159 | PrivateGPTZylon AI | 60.8 |
| #160 | Activeloop Deep LakeActiveloop Inc. | 60.6 |
| #161 | LlamaCloudLlamaIndex Inc. | 60.5 |
| #162 | MarqoMarqo Pty Ltd | 60.4 |
| #163 | Saner AISaner.AI | 60.4 |
| #164 | R3MemHKUST (2025) | 60.3 |
| #165 | ParadeDBParadeDB Inc. (YC S23) | 60.2 |
| #166 | Redis VectorRedis Ltd. | 60.2 |
| #167 | Carbon AICarbon (acquired by Perplexity, Dec 2024) | 60 |
| #168 | ValdYahoo Japan | 60 |
| #169 | Unstructured IOUnstructured Technologies Inc. | 59.7 |
| #170 | HippoRAGOSU NLP Group (Ohio State University) | 59.2 |
| #171 | Mem AIMem Labs | 59.1 |
| #172 | TurboPufferTurboPuffer Inc. | 59.1 |
| #173 | VectaraVectara Inc. | 58.8 |
| #174 | Memory³Institute for Advanced Algorithms Research Shanghai / Peking University | 57.6 |
| #175 | MemoRAGBAAI / Qhjqhj00 | 54.8 |
| #176 | MemaryKingjulio8238 | 54.2 |
| #177 | WeaviateWeaviate | 53.1 |
| #178 | MilvusZilliz | 52.6 |
| #179 | PineconePinecone Systems | 52.4 |
| #180 | QdrantQdrant | 51.8 |
| #181 | Haystack Memorydeepset | 51.2 |
| #182 | AtlasMeta AI FAIR (Izacard et al.) | 50.6 |
| #183 | ChromaChroma | 49.7 |
| #184 | txtaiNeuML | 49.5 |
| #185 | KnowAgentzjunlp (Zhejiang University) | 48.1 |
| #186 | ExpeLTsinghua University (Zhao et al.) | 39 |
| #187 | StreamingLLMMIT Han Lab / Meta AI (Xiao et al.) | 24.9 |