Back to Benchmarks
AgentBench-Mem
AgentBench Memory Track
Benchmark Metadata
PublisherTsinghua KEG
VenueICLR 2024
Evaluation Typeautomatic
Dimensions8
Test Prompts1,360
ScoringHigher is better
Update Frequencyannual
PaperView Paper
LeaderboardView Leaderboard
What It Measures
- Task-state retention across multi-step plans
- Tool-call history consistency
- Recovery from intermediate failures
- Sub-goal tracking
What It Does Not Measure
- Single-turn QA
- Long-document retrieval
- Personalization
All Systems Evaluated(144 systems)
| Rank | System | Score |
|---|---|---|
| #1 | MemaryKingjulio8238 | 72 |
| #2 | A-MEMAGI Research / Rutgers | 72 |
| #3 | AbridgeAbridge | 72 |
| #4 | Adept AIAdept AI Labs (acquired by Amazon 2024) | 72 |
| #5 | Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig) | 72 |
| #6 | AgentScopeModelScope (Alibaba) | 72 |
| #7 | AgentVerseOpenBMB (Tsinghua) | 72 |
| #8 | AGiXTJosh-XT | 72 |
| #9 | AppAgentTencent / mnotgod96 | 72 |
| #10 | ArcMemoUC Berkeley / Stanford (Ho et al.) | 72 |
| #11 | Athina AIAthina AI (YC W23) | 72 |
| #12 | AutoGen Core MemoryMicrosoft | 72 |
| #13 | AutoGen StudioMicrosoft Research | 72 |
| #14 | AutoGPT PlatformSignificant Gravitas | 72 |
| #15 | AutoWebGLMTHUDM | 72 |
| #16 | BabyAGIYohei Nakajima | 72 |
| #17 | Backboard IOBackboard.io | 72 |
| #18 | Bee ComputerBee (acquired by Amazon 2026) | 72 |
| #19 | Bishengdataelement | 72 |
| #20 | BotpressBotpress Inc. | 72 |
| #21 | BrowserGymServiceNow Research | 72 |
| #22 | CAMELCAMEL-AI.org | 72 |
| #23 | Character AICharacter.AI (Google investment) | 72 |
| #24 | Charlie MnemonicGoodAI | 72 |
| #25 | ChatDBTsinghua University (Hu et al.) | 72 |
| #26 | ChatDev 2.0OpenBMB | 72 |
| #27 | CognigyCognigy GmbH (acquired by NICE, July 2025) | 72 |
| #28 | CradleBAAI-Agents | 72 |
| #29 | CrewAI EnterpriseCrewAI Inc. | 72 |
| #30 | CrewAICrewAI Inc. (Joao Moura) | 72 |
| #31 | D-MemYou et al. (2025) | 72 |
| #32 | DB-GPTeosphoros-ai | 72 |
| #33 | DifyLangGenius | 72 |
| #34 | Dust ttDust (formerly XP1) | 72 |
| #35 | ExpeLTsinghua University (Zhao et al.) | 72 |
| #36 | FastGPTlabring | 72 |
| #37 | FlowiseFlowiseAI | 72 |
| #38 | Friend AIFriend | 72 |
| #39 | Galileo AIGalileo Technologies Inc. | 72 |
| #40 | GAMVectorSpaceLab (BAAI-related) | 72 |
| #41 | Generative AgentsStanford / Google | 72 |
| #42 | Generative AgentsStanford University / Google Research | 72 |
| #43 | Granola AIGranola | 72 |
| #44 | HebbiaHebbia, Inc. | 72 |
| #45 | HiMemZhu et al. (JD.com, 2026) | 72 |
| #46 | HoneyHiveHoneyHive Inc. | 72 |
| #47 | HuggingGPT / JARVISMicrosoft Research | 72 |
| #48 | HybridAGISynaLinks | 72 |
| #49 | JARVIS-1CraftJarvis | 72 |
| #50 | KnowAgentzjunlp (Zhejiang University) | 72 |
| #51 | Kore AIKore.ai Inc. | 72 |
| #52 | LagentInternLM (Shanghai AI Lab) | 72 |
| #53 | LangflowLangflow-ai (DataStax) | 72 |
| #54 | LangGraphLangChain | 72 |
| #55 | LangSmith LangGraph CloudLangChain Inc. | 72 |
| #56 | Limitless PendantLimitless AI (acquired by Meta Dec 2025) | 72 |
| #57 | Lindy AILindy AI | 72 |
| #58 | Lyzr CognisLyzr AI | 72 |
| #59 | Maxim AIMaxim AI Inc. | 72 |
| #60 | MCP Memory ServerAnthropic / Model Context Protocol | 72 |
| #61 | Memoripycaspianmoon | 72 |
| #62 | MemOSMemTensor (Li, Zhang, et al.) | 72 |
| #63 | MempZhejiang University (Fang et al.) | 72 |
| #64 | MemR32025 (December submission) | 72 |
| #65 | memUNevaMind-AI | 72 |
| #66 | MetaGPTDeepWisdom / geekan | 72 |
| #67 | MIRIXMIRIX AI (Wang, Chen) | 72 |
| #68 | Mobile-AgentAlibaba Tongyi Lab (X-PLUG) | 72 |
| #69 | MoTFudan University (Li & Qiu) | 72 |
| #70 | MoTFudan (Li, Qiu) | 72 |
| #71 | MultiOnMultiOn (now AGI Inc.) | 72 |
| #72 | Nabla CopilotNabla | 72 |
| #73 | NemoriNemori AI (independent) | 72 |
| #74 | Neo4j AuraDBNeo4j Inc. | 72 |
| #75 | Nomi AIGlimpse AI, Inc. | 72 |
| #76 | Nuance DAXNuance Communications (Microsoft) | 72 |
| #77 | Onyxonyx-dot-app | 72 |
| #78 | Open InterpreterOpenInterpreter | 72 |
| #79 | OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.) | 72 |
| #80 | ParadotWithFeeling.AI | 72 |
| #81 | Pi InflectionInflection AI | 72 |
| #82 | Pickle AISoul Computer (YC-backed) | 72 |
| #83 | Qwen-AgentQwenLM (Alibaba) | 72 |
| #84 | RecallMCisco Research / independent (Kynoch & Latapie) | 72 |
| #85 | ReflexionNortheastern / MIT / Princeton (Shinn et al.) | 72 |
| #86 | ReMeModelScope (Alibaba) | 72 |
| #87 | ReplikaLuka, Inc. | 72 |
| #88 | RMMGoogle / UCSB (2025) | 72 |
| #89 | SCMBeihang / NLPR (Wang et al.) | 72 |
| #90 | Second MeMindverse (Shang, Li, et al.) | 72 |
| #91 | Self-RAGUniversity of Washington / Allen AI (Asai et al.) | 72 |
| #92 | SID AISID (YC) | 72 |
| #93 | Stack AIStack AI Inc. (YC W23) | 72 |
| #94 | Suki AISuki (formerly Robin AI) | 72 |
| #95 | SuperAGITransformerOptimus | 72 |
| #96 | Swarmskyegomez / Swarms Corp | 72 |
| #97 | SynapseNanyang Technological University (Zheng et al.) | 72 |
| #98 | SynapseNTU / Salesforce (Zheng et al.) | 72 |
| #99 | Tab AITab (Avi Schiffmann) | 72 |
| #100 | Talkie AIMiniMax | 72 |
| #101 | Think-in-MemoryAnt Group / Alibaba (Liu et al.) | 72 |
| #102 | VectorShiftVectorShift Inc. (YC S23) | 72 |
| #103 | Vellum AIVellum AI Inc. (YC W23) | 72 |
| #104 | VoiceflowVoiceflow Inc. | 72 |
| #105 | VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.) | 72 |
| #106 | WebVoyagerMinorJerry et al. | 72 |
| #107 | xmemoryxmemory Inc. | 72 |
| #108 | GleanGlean Technologies | 71.8 |
| #109 | KindroidKindroid | 71.5 |
| #110 | GPTeam101dotxyz | 71.4 |
| #111 | AriGraphAIRI Institute / Moscow | 70.9 |
| #112 | RAGFlowInfiniFlow | 70.5 |
| #113 | MemoChatUniversity of Warwick / Alibaba | 70.4 |
| #114 | Haystack Memorydeepset | 69.7 |
| #115 | Plaud NotePLAUD | 69.4 |
| #116 | MemoryScopeAlibaba ModelScope | 68.9 |
| #117 | Personal AIPersonal AI | 68.6 |
| #118 | SupermemorySupermemory | 68.5 |
| #119 | Gemini MemoryGoogle | 68.4 |
| #120 | Claude ProjectsAnthropic | 68 |
| #121 | HEMAindependent (Ahn et al.) | 68 |
| #122 | MemoryBankInstitute of Software, Chinese Academy of Sciences | 66.8 |
| #123 | MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.) | 65.8 |
| #124 | MemoryBankHarbin Institute of Technology / SenseTime | 65 |
| #125 | ChatGPT MemoryOpenAI | 64.8 |
| #126 | EpsillaEpsilla Inc. (YC S23) | 64 |
| #127 | MongoDB Atlas VectorMongoDB Inc. | 63.8 |
| #128 | Redis VectorRedis Ltd. | 63.7 |
| #129 | Notion AINotion Labs | 63.6 |
| #130 | MemoroMIT Media Lab | 63.2 |
| #131 | LangMemLangChain | 62.7 |
| #132 | KDB AIKX Systems | 62.5 |
| #133 | MnemosyneJohns Hopkins / independent (2025) | 61.4 |
| #134 | Mnemosyneindependent | 61 |
| #135 | Couchbase VectorCouchbase Inc. | 60.2 |
| #136 | Sana AISana Labs | 60.1 |
| #137 | AnythingLLMMintplex Labs | 59.9 |
| #138 | LlamaIndex MemoryLlamaIndex | 59.7 |
| #139 | RagieRagie Inc. | 59.5 |
| #140 | R3MemHKUST (2025) | 59.2 |
| #141 | EM-LLMem-llm (academic consortium) | 59.1 |
| #142 | MemoriGibsonAI | 56.9 |
| #143 | LettaLetta (formerly MemGPT) | 52.7 |
| #144 | MemGPT ClassicBerkeley / Letta | 47.3 |