| #1 | Backboard IOBackboard.io | 82 | Backboard IO vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #2 | MemR32025 (December submission) | 82 | MemR3 paper (arXiv:2512.20237); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #3 | xmemoryxmemory Inc. | 82 | xmemory vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #4 | VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.) | 81.5 | Voyager paper (arXiv:2305.16291); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #5 | HuggingGPT / JARVISMicrosoft Research | 80.3 | HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #6 | MCP Memory ServerAnthropic / Model Context Protocol | 80.2 | MCP Memory Server (modelcontextprotocol/servers); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #7 | ReflexionNortheastern / MIT / Princeton (Shinn et al.) | 80 | arXiv:2303.11366 Figure 4c / Table 5 — CoT+Reflexion with GPT-4 and GOLD context (not retrieval). Reading comprehension setting. |
| #8 | MIRIXMIRIX AI (Wang, Chen) | 79.9 | MIRIX paper (arXiv:2507.07957); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #9 | Generative AgentsStanford University / Google Research | 79.7 | Generative Agents paper (arXiv:2304.03442); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #10 | AutoGPT PlatformSignificant Gravitas | 79.5 | AutoGPT Platform (Significant-Gravitas/AutoGPT); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #11 | D-MemYou et al. (2025) | 79.4 | D-Mem paper (arXiv:2603.18631); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #12 | Onyxonyx-dot-app | 79 | Onyx (onyx-dot-app/onyx); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #13 | Lyzr CognisLyzr AI | 78.6 | Lyzr Cognis vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #14 | SupermemorySupermemory | 78.6 | Supermemory (supermemoryai/supermemory); evaluated on HotpotQA (Stanford / CMU, 2018) |
| #15 | SCMBeihang / NLPR (Wang et al.) | 78.4 | SCM paper (arXiv:2304.13343); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #16 | HiMemZhu et al. (JD.com, 2026) | 78.2 | HiMem paper (arXiv:2601.06377); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #17 | DifyLangGenius | 78 | Dify (langgenius/dify); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #18 | ChatDBTsinghua University (Hu et al.) | 77.9 | ChatDB paper (arXiv:2306.03901); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #19 | NemoriNemori AI (independent) | 77.8 | Nemori paper (arXiv:2508.03341); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #20 | Self-RAGUniversity of Washington / Allen AI (Asai et al.) | 77.8 | Self-RAG paper (arXiv:2310.11511); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #21 | Athina AIAthina AI (YC W23) | 77.7 | Athina AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #22 | WebVoyagerMinorJerry et al. | 77.7 | WebVoyager paper (arXiv:2401.13919); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #23 | AgentVerseOpenBMB (Tsinghua) | 77.6 | AgentVerse paper (arXiv:2308.10848); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #24 | CradleBAAI-Agents | 77.5 | Cradle paper (arXiv:2403.03186); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #25 | CrewAICrewAI Inc. (Joao Moura) | 77.5 | CrewAI (joaomdmoura/crewAI); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #26 | Mobile-AgentAlibaba Tongyi Lab (X-PLUG) | 77.4 | Mobile-Agent paper (arXiv:2508.15144); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #27 | RecallMCisco Research / independent (Kynoch & Latapie) | 77.4 | RecallM paper (arXiv:2307.02738); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #28 | SuperAGITransformerOptimus | 77.3 | SuperAGI (TransformerOptimus/SuperAGI); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #29 | SynapseNanyang Technological University (Zheng et al.) | 77.2 | Synapse paper (arXiv:2306.07863); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #30 | SID AISID (YC) | 77.1 | SID AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #31 | A-MEMAGI Research / Rutgers | 77 | A-MEM paper (arXiv:2502.12110); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #32 | DiffbotDiffbot Inc. | 76.9 | Diffbot vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #33 | Neo4j LLM Graph BuilderNeo4j Labs | 76.8 | Neo4j LLM Graph Builder (neo4j-labs/llm-graph-builder); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #34 | AutoGen StudioMicrosoft Research | 76.7 | AutoGen Studio vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #35 | HybridAGISynaLinks | 76.7 | HybridAGI (SynaLinks/HybridAGI); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #36 | Adept AIAdept AI Labs (acquired by Amazon 2024) | 76.4 | Adept AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #37 | AppAgentTencent / mnotgod96 | 76.4 | AppAgent paper (arXiv:2312.13771); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #38 | AutoGen Core MemoryMicrosoft | 76.4 | AutoGen Core Memory paper (arXiv:2308.08155); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #39 | Galileo AIGalileo Technologies Inc. | 76.2 | Galileo AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #40 | KAGOpenSPG / Ant Group | 76.2 | arXiv:2409.13731 Table 8 — F1 with LFSH_ref3 + DeepSeek-V2 backbone |
| #41 | RAGFlowInfiniFlow | 76.2 | RAGFlow (infiniflow/ragflow); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #42 | ReMeModelScope (Alibaba) | 76.2 | ReMe (modelscope/ReMe); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #43 | MarkerDatalab (datalab-to) | 76.1 | Marker (datalab-to/marker); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #44 | BotpressBotpress Inc. | 76 | Botpress vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #45 | MultiOnMultiOn (now AGI Inc.) | 75.9 | MultiOn vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #46 | CognigyCognigy GmbH (acquired by NICE, July 2025) | 75.7 | Cognigy vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #47 | OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.) | 75.7 | OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #48 | CAMELCAMEL-AI.org | 75.6 | CAMEL paper (arXiv:2303.17760); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #49 | HebbiaHebbia, Inc. | 75.6 | Hebbia vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #50 | HippoRAG 2OSU NLP Group | 75.5 | arXiv:2502.14802 Table 2 — F1 with Llama-3.3-70B-Instruct backbone |
| #51 | AgentScopeModelScope (Alibaba) | 75.3 | AgentScope paper (arXiv:2402.14034); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #52 | BrowserGymServiceNow Research | 75.3 | BrowserGym paper (arXiv:2412.05467); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #53 | GleanGlean Technologies | 75.3 | Glean vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #54 | TrustRAGGoMate Community | 75.3 | TrustRAG (gomate-community/TrustRAG); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #55 | AllegroGraphFranz Inc. | 75.2 | AllegroGraph vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #56 | Vellum AIVellum AI Inc. (YC W23) | 75.1 | Vellum AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #57 | Bishengdataelement | 74.9 | Bisheng (dataelement/bisheng); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #58 | MoTFudan University (Li & Qiu) | 74.9 | MoT paper (arXiv:2305.05181); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #59 | VoiceflowVoiceflow Inc. | 74.9 | Voiceflow vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #60 | ArcMemoUC Berkeley / Stanford (Ho et al.) | 74.8 | ArcMemo paper (arXiv:2509.04439); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #61 | HoneyHiveHoneyHive Inc. | 74.8 | HoneyHive vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #62 | MetaGPTDeepWisdom / geekan | 74.7 | MetaGPT paper (arXiv:2308.00352); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #63 | AutoWebGLMTHUDM | 74.4 | AutoWebGLM paper (arXiv:2404.03648); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #64 | Swarmskyegomez / Swarms Corp | 74.4 | Swarms (kyegomez/swarms); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #65 | FastGPTlabring | 74.3 | FastGPT (labring/FastGPT); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #66 | Maxim AIMaxim AI Inc. | 74.3 | Maxim AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #67 | Stack AIStack AI Inc. (YC W23) | 74 | Stack AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #68 | RMMGoogle / UCSB (2025) | 73.9 | RMM paper (arXiv:2503.08026); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #69 | Nano GraphRAGgusye1234 | 73.7 | Nano GraphRAG (gusye1234/nano-graphrag); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #70 | Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig) | 73.6 | Agent Workflow Memory paper (arXiv:2409.07429); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #71 | MiniRAGHKUDS | 73.4 | MiniRAG paper (arXiv:2501.06713); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #72 | Kore AIKore.ai Inc. | 73 | Kore AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #73 | CrewAI EnterpriseCrewAI Inc. | 72.9 | CrewAI Enterprise vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #74 | PaperQA2FutureHouse | 72.9 | PaperQA2 paper (arXiv:2409.13740); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #75 | LagentInternLM (Shanghai AI Lab) | 72.8 | Lagent (InternLM/lagent); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #76 | LangSmith LangGraph CloudLangChain Inc. | 72.8 | LangSmith LangGraph Cloud vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #77 | Lindy AILindy AI | 72.8 | Lindy AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #78 | Open InterpreterOpenInterpreter | 72.7 | Open Interpreter (OpenInterpreter/open-interpreter); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #79 | BabyAGIYohei Nakajima | 72.6 | BabyAGI (yoheinakajima/babyagi); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #80 | DB-GPTeosphoros-ai | 72.6 | DB-GPT (eosphoros-ai/DB-GPT); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #81 | Dust ttDust (formerly XP1) | 72.5 | Dust tt vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #82 | AGiXTJosh-XT | 72.4 | AGiXT (Josh-XT/AGiXT); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #83 | MempZhejiang University (Fang et al.) | 72.4 | Memp paper (arXiv:2508.06433); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #84 | VectorShiftVectorShift Inc. (YC S23) | 72.4 | VectorShift vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #85 | FlowiseFlowiseAI | 72.3 | Flowise (FlowiseAI/Flowise); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #86 | Generative AgentsStanford / Google | 72 | Generative Agents paper (arXiv:2304.03442); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #87 | LangflowLangflow-ai (DataStax) | 71.9 | Langflow (langflow-ai/langflow); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #88 | LightRAGHKUDS (HKU Data Intelligence Lab) | 71.9 | LightRAG paper (arXiv:2410.05779); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #89 | Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025) | 71.9 | Ontotext GraphDB vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #90 | ChatDev 2.0OpenBMB | 71.8 | ChatDev 2.0 paper (arXiv:2307.07924); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #91 | MemOSMemTensor (Li, Zhang, et al.) | 71.8 | MemOS paper (arXiv:2505.22101); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #92 | LangGraphLangChain | 71.6 | LangGraph (langchain-ai/langgraph); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #93 | GraphRAGMicrosoft | 71.5 | GraphRAG paper (arXiv:2404.16130); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #94 | MoTFudan (Li, Qiu) | 71.3 | MoT paper (arXiv:2305.05181); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #95 | MemoryScopeAlibaba ModelScope | 71.2 | MemoryScope (modelscope/MemoryScope); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #96 | Qwen-AgentQwenLM (Alibaba) | 70.8 | Qwen-Agent (QwenLM/Qwen-Agent); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #97 | JARVIS-1CraftJarvis | 70.6 | JARVIS-1 paper (arXiv:2311.05997); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #98 | Claude ProjectsAnthropic | 70.4 | Claude Projects vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #99 | StardogStardog Union Inc. | 70.4 | Stardog vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #100 | Neo4j AuraDBNeo4j Inc. | 70.2 | Neo4j AuraDB vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #101 | GAMVectorSpaceLab (BAAI-related) | 70.1 | GAM (VectorSpaceLab/general-agentic-memory); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #102 | Memoripycaspianmoon | 70.1 | Memoripy (caspianmoon/memoripy); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #103 | Think-in-MemoryAnt Group / Alibaba (Liu et al.) | 70.1 | Think-in-Memory paper (arXiv:2311.08719); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #104 | Cohere EmbedCohere Inc. | 70 | Cohere Embed vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #105 | PathRAGBUPT-GAMMA | 70 | PathRAG paper (arXiv:2502.14902); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #106 | AriGraphAIRI Institute / Moscow | 69.9 | github.com/AIRI-Institute/AriGraph README (arXiv:2407.04363 transfer) — F1 with GPT-4; EM 59.5; 200 test samples |
| #107 | LarimarIBM Research | 69.9 | Larimar paper (arXiv:2403.11901); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #108 | GraphRAG-SDKFalkorDB | 69.5 | GraphRAG-SDK (FalkorDB/GraphRAG-SDK); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #109 | kNN-LMStanford / Facebook AI Research (Khandelwal et al.) | 69.5 | kNN-LM paper (arXiv:1911.00172); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #110 | R2RSciPhi-AI | 69.4 | R2R (SciPhi-AI/R2R); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #111 | Voyage AIVoyage AI (acquired by MongoDB, Feb 2025) | 69.2 | Voyage AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #112 | GPTeam101dotxyz | 68.9 | GPTeam (101dotxyz/GPTeam); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #113 | Mixedbread AIMixedbread AI | 68.4 | Mixedbread AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #114 | Granola AIGranola | 68.2 | Granola AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #115 | MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.) | 68 | Memformer paper (arXiv:2010.06891); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #116 | LangMemLangChain | 67.8 | LangMem (langchain-ai/langmem); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #117 | MemoChatUniversity of Warwick / Alibaba | 67.6 | MemoChat paper (arXiv:2308.08239); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #118 | LlamaIndex MemoryLlamaIndex | 67.5 | LlamaIndex Memory (run-llama/llama_index); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #119 | SynapseNTU / Salesforce (Zheng et al.) | 67.4 | Synapse paper (arXiv:2306.07863); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #120 | TRIMEPrinceton NLP (Zhong, Lei, Chen) | 67.4 | TRIME paper (arXiv:2205.12674); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #121 | Heyday AIHeyday (shut down 2025) | 67.2 | Heyday AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #122 | HEMAindependent (Ahn et al.) | 66.7 | HEMA paper (arXiv:2504.16754); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #123 | Astra DBDataStax | 66.6 | Astra DB vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #124 | Sana AISana Labs | 66.5 | Sana AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #125 | Jina AI EmbeddingsJina AI GmbH | 66.1 | Jina AI Embeddings vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #126 | Nomic AtlasNomic AI Inc. | 66 | Nomic Atlas vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #127 | Neon VectorNeon Inc. | 65.8 | Neon Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #128 | Notion AINotion Labs | 65.5 | Notion AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #129 | EpsillaEpsilla Inc. (YC S23) | 65.4 | Epsilla vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #130 | MemoriGibsonAI | 65 | Memori (GibsonAI/memori); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #131 | RAPTORStanford (Sarthi, Abdullah et al.) | 65 | RAPTOR paper (arXiv:2401.18059); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #132 | Mnemosyneindependent | 64.8 | Mnemosyne paper (arXiv:2510.08601); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #133 | QuivrQuivrHQ | 64.7 | Quivr (QuivrHQ/quivr); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #134 | MendableMendable (YC-backed) | 64.6 | Mendable vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #135 | pgvector Supabase Neonpgvector OSS / Supabase Inc. / Neon Inc. | 64.3 | pgvector Supabase Neon vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #136 | REALMGoogle Research (Guu et al.) | 64.3 | REALM paper (arXiv:2002.08909); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #137 | Supabase VectorSupabase Inc. | 64 | Supabase Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #138 | RagieRagie Inc. | 63.9 | Ragie vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #139 | CognitaTrueFoundry | 63.6 | Cognita (truefoundry/cognita); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #140 | KDB AIKX Systems | 63.6 | KDB AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #141 | MongoDB Atlas VectorMongoDB Inc. | 63.5 | MongoDB Atlas Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #142 | ColPaliilluin-tech | 63.3 | ColPali paper (arXiv:2407.01449); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #143 | RETRODeepMind (Borgeaud et al.) | 62.9 | RETRO paper (arXiv:2112.04426); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #144 | Elasticsearch VectorElastic N.V. | 62.8 | Elasticsearch Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #145 | Manticore SearchManticore Software Ltd. | 62.8 | Manticore Search vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #146 | MemoryBankHarbin Institute of Technology / SenseTime | 62.4 | MemoryBank paper (arXiv:2305.10250); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #147 | SelfmemTsinghua / Microsoft (Cheng et al.) | 62.4 | Selfmem paper (arXiv:2305.02437); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #148 | Couchbase VectorCouchbase Inc. | 62.2 | Couchbase Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #149 | OpenSearch VectorOpenSearch Project (AWS-led) | 62.2 | OpenSearch Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #150 | vectorizeVectorize Inc. | 62.2 | vectorize vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #151 | GraphitiZep AI | 62.1 | Zep / Graphiti paper |
| #152 | Vespa AIYahoo / Vespa.ai (independent OSS project) | 62.1 | Vespa AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #153 | MyScaleMyScale Inc. | 62 | MyScale vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #154 | SingleStore VectorSingleStore Inc. | 61.9 | SingleStore Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #155 | MemoroMIT Media Lab | 61.8 | Memoro paper (arXiv:2403.02135); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #156 | LanceDBLanceDB Inc. (YC S22) | 61.4 | LanceDB vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #157 | VerbaWeaviate | 61.4 | Verba (weaviate/Verba); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #158 | AnythingLLMMintplex Labs | 61.1 | AnythingLLM (Mintplex-Labs/anything-llm); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #159 | PrivateGPTZylon AI | 60.8 | PrivateGPT (zylon-ai/private-gpt); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #160 | Activeloop Deep LakeActiveloop Inc. | 60.6 | Activeloop Deep Lake vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #161 | LlamaCloudLlamaIndex Inc. | 60.5 | LlamaCloud vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #162 | MarqoMarqo Pty Ltd | 60.4 | Marqo vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #163 | Saner AISaner.AI | 60.4 | Saner AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #164 | R3MemHKUST (2025) | 60.3 | R3Mem paper (arXiv:2502.15957); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #165 | ParadeDBParadeDB Inc. (YC S23) | 60.2 | ParadeDB vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #166 | Redis VectorRedis Ltd. | 60.2 | Redis Vector vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #167 | Carbon AICarbon (acquired by Perplexity, Dec 2024) | 60 | Carbon AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #168 | ValdYahoo Japan | 60 | Vald vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #169 | Unstructured IOUnstructured Technologies Inc. | 59.7 | Unstructured IO vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #170 | HippoRAGOSU NLP Group (Ohio State University) | 59.2 | arXiv:2405.14831 Table 4 — F1 with IRCoT+HippoRAG (ColBERTv2); EM 45.7. HippoRAG alone: F1 55.0 |
| #171 | Mem AIMem Labs | 59.1 | Mem AI vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #172 | TurboPufferTurboPuffer Inc. | 59.1 | TurboPuffer vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #173 | VectaraVectara Inc. | 58.8 | Vectara vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #174 | CogneeCognee | 58.6 | Cognee benchmark |
| #175 | Memory³Institute for Advanced Algorithms Research Shanghai / Peking University | 57.6 | Memory³ paper (arXiv:2407.01178); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809) |
| #176 | MemoRAGBAAI / Qhjqhj00 | 54.8 | arXiv:2409.05591 Table 1 — MemoRAG on HotpotQA (via LongBench) |
| #177 | MemaryKingjulio8238 | 54.2 | Memary repo evals |
| #178 | WeaviateWeaviate | 53.1 | Weaviate evals |
| #179 | MilvusZilliz | 52.6 | Milvus benchmark |
| #180 | PineconePinecone Systems | 52.4 | Pinecone evals |
| #181 | QdrantQdrant | 51.8 | Qdrant evals |
| #182 | Haystack Memorydeepset | 51.2 | Haystack benchmark |
| #183 | AtlasMeta AI FAIR (Izacard et al.) | 50.6 | arXiv:2208.03299 Table 10 — KILT-filtered HotpotQA EM, full-train; 64-shot EM=34.7 |
| #184 | ChromaChroma | 49.7 | Chroma benchmark |
| #185 | txtaiNeuML | 49.5 | txtai benchmark |
| #186 | KnowAgentzjunlp (Zhejiang University) | 48.1 | arXiv:2403.03101 Table 1 — KnowAgent-70b (Llama-2-70b-chat) F1 averaged across Easy/Medium/Hard |
| #187 | ExpeLTsinghua University (Zhao et al.) | 39 | arXiv:2308.10144 Figure 5 — Success rate read from Figure 5; not a precise table cell |
| #188 | StreamingLLMMIT Han Lab / Meta AI (Xiao et al.) | 24.9 | arXiv:2309.17453 Table 8 — StreamingLLM 1750+1750 on HotpotQA subset of LongBench |