LoCoMo

Name: LoCoMo: Long-Term Conversational Memory Benchmark
Creator: Snap Research
Keywords: multi-turn-recall, agent-memory, episodic-session

LoCoMo: Long-Term Conversational Memory Benchmark

Benchmark Metadata

PublisherSnap Research

VenueACL 2024

Evaluation Typehybrid

Dimensions4

Test Prompts300

ScoringHigher is better

Update Frequencyannual

PaperView Paper

LeaderboardView Leaderboard

What It Measures

Single-hop conversational QA
Multi-hop conversational QA
Temporal reasoning over dialogue
Open-domain knowledge updates

What It Does Not Measure

Document QA
Code understanding
Numerical reasoning beyond dialogue

All Systems Evaluated(161 systems)

16 self-reported145 estimated

Rank	System	Score	Provenance	Source
#1	memUNevaMind-AI	92.1	Self-Reported	memu.pro/benchmark + github.com/NevaMind-AI/memU README — Self-reported hybrid retrieval (semantic + keyword + contextual)
#2	Backboard IOBackboard.io	90	Self-Reported	github.com/Backboard-io/Backboard-Locomo-Benchmark — GPT-4.1 judge, temp=0.1; per-category: SingleHop 89.4, MultiHop 75.0, OpenDomain 91.2, Temporal 91.9
#3	MemPalaceBen Sigman / Milla Jovovich (independent open-source)	88.9	Self-Reported	Self-reported R@10 Hybrid v5 top-10 (no LLM). Best config (Hybrid v5 + Sonnet rerank top-50) achieves 100% R@5/R@10. Baseline session-no-rerank: 60.3%.
#4	MemR32025 (December submission)	86.8	Self-Reported	arXiv:2512.20237 Table 1 — GPT-4.1-mini + RAG backbone, LLM-as-Judge overall
#5	A-MEMAGI Research / Rutgers	86.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#6	MIRIXMIRIX AI (Wang, Chen)	85.4	Self-Reported	arXiv:2507.07957 results table — Overall accuracy; SingleHop 85.11, MultiHop 83.70, OpenDomain 65.62, Temporal 88.39
#7	xmemoryxmemory Inc.	85.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#8	VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.)	83.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#9	SupermemorySupermemory	83.5	Estimated	Arena estimate — no published LoCoMo score from Supermemory
#10	ExpeLTsinghua University (Zhao et al.)	83.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#11	SCMBeihang / NLPR (Wang et al.)	82.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#12	CrewAI EnterpriseCrewAI Inc.	81.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#13	Talkie AIMiniMax	81.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#14	ReflexionNortheastern / MIT / Princeton (Shinn et al.)	81.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#15	Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig)	81	Estimated	Arena estimate — derived from capability profile, not independently verified
#16	CognigyCognigy GmbH (acquired by NICE, July 2025)	80.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#17	HiMemZhu et al. (JD.com, 2026)	80.7	Self-Reported	arXiv:2601.06377 Table 1 — Overall GPT-Score; F1 34.95. Per-category: SingleHop 89.22, MultiHop 70.92, Temporal 74.77, OpenDomain 54.86
#18	Nabla CopilotNabla	80.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#19	DB-GPTeosphoros-ai	80.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#20	BrowserGymServiceNow Research	80.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#21	FastGPTlabring	80.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#22	OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.)	80	Estimated	Arena estimate — derived from capability profile, not independently verified
#23	Nomi AIGlimpse AI, Inc.	79.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#24	AutoGPT PlatformSignificant Gravitas	79.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#25	AutoGen StudioMicrosoft Research	79.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#26	Qwen-AgentQwenLM (Alibaba)	79.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#27	AbridgeAbridge	79.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#28	NemoriNemori AI (independent)	79.4	Self-Reported	arXiv:2508.03341 results table — LLM-judge score with gpt-4.1-mini backbone; gpt-4o-mini variant scored 74.4
#29	Swarmskyegomez / Swarms Corp	79.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#30	CrewAICrewAI Inc. (Joao Moura)	79.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#31	Bishengdataelement	78.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#32	DifyLangGenius	78.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#33	Adept AIAdept AI Labs (acquired by Amazon 2024)	78.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#34	SID AISID (YC)	78.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#35	D-MemYou et al. (2025)	78.4	Self-Reported	arXiv:2603.18631 Table 1 — LLM-judge with GPT-4o-mini Full Deliberation; F1 55.3, BLEU 44.2. Using LLM-judge as primary to match Mem0/Backboard/MemR3 comparability
#36	Plaud NotePLAUD	78.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#37	RecallMCisco Research / independent (Kynoch & Latapie)	78.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#38	VoiceflowVoiceflow Inc.	78.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#39	BotpressBotpress Inc.	78.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#40	HoneyHiveHoneyHive Inc.	77.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#41	MetaGPTDeepWisdom / geekan	77.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#42	GAMVectorSpaceLab (BAAI-related)	77.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#43	KindroidKindroid	77.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#44	Self-RAGUniversity of Washington / Allen AI (Asai et al.)	77.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#45	Dust ttDust (formerly XP1)	77.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#46	Generative AgentsStanford University / Google Research	77.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#47	SynapseNanyang Technological University (Zheng et al.)	77.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#48	AutoGen Core MemoryMicrosoft	77.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#49	MultiOnMultiOn (now AGI Inc.)	77.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#50	Athina AIAthina AI (YC W23)	77.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#51	CAMELCAMEL-AI.org	77.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#52	AGiXTJosh-XT	76.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#53	Galileo AIGalileo Technologies Inc.	76.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#54	Suki AISuki (formerly Robin AI)	76.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#55	HEMAindependent (Ahn et al.)	76.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#56	MempZhejiang University (Fang et al.)	76.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#57	WebVoyagerMinorJerry et al.	76.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#58	LangflowLangflow-ai (DataStax)	76.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#59	ParadotWithFeeling.AI	76.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#60	MemoryLLMUCSD / Apple (Wang et al.)	76.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#61	Nuance DAXNuance Communications (Microsoft)	76.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#62	SuperAGITransformerOptimus	76.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#63	ChatDev 2.0OpenBMB	76.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#64	HebbiaHebbia, Inc.	76.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#65	LangGraphLangChain	76	Estimated	Arena estimate — derived from capability profile, not independently verified
#66	Onyxonyx-dot-app	76	Estimated	Arena estimate — derived from capability profile, not independently verified
#67	MemOSMemTensor (Li, Zhang, et al.)	75.8	Self-Reported	arXiv:2507.03724 — MemOS-1031 overall LLM Judge Score — Headline LoCoMo LLM-as-Judge score; also shown on MemTensor GitHub README
#68	Pickle AISoul Computer (YC-backed)	75.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#69	Stack AIStack AI Inc. (YC W23)	75.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#70	LangSmith LangGraph CloudLangChain Inc.	75.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#71	Pi InflectionInflection AI	75.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#72	BabyAGIYohei Nakajima	75.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#73	ArcMemoUC Berkeley / Stanford (Ho et al.)	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#74	Kore AIKore.ai Inc.	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#75	Limitless PendantLimitless AI (acquired by Meta Dec 2025)	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#76	ReplikaLuka, Inc.	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#77	Think-in-MemoryAnt Group / Alibaba (Liu et al.)	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#78	GPTeam101dotxyz	75.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#79	MCP Memory ServerAnthropic / Model Context Protocol	75.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#80	MoTFudan University (Li & Qiu)	75.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#81	Open InterpreterOpenInterpreter	75.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#82	AgentVerseOpenBMB (Tsinghua)	75.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#83	Memoripycaspianmoon	75.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#84	AgentScopeModelScope (Alibaba)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#85	Granola AIGranola	74.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#86	VectorShiftVectorShift Inc. (YC S23)	74.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#87	HippoRAG 2OSU NLP Group	74.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#88	LagentInternLM (Shanghai AI Lab)	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#89	RAGFlowInfiniFlow	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#90	Vellum AIVellum AI Inc. (YC W23)	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#91	CradleBAAI-Agents	74.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#92	AutoWebGLMTHUDM	74.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#93	Maxim AIMaxim AI Inc.	74.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#94	HippoRAGOSU NLP Group (Ohio State University)	74.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#95	HuggingGPT / JARVISMicrosoft Research	74	Estimated	Arena estimate — derived from capability profile, not independently verified
#96	Lindy AILindy AI	74	Estimated	Arena estimate — derived from capability profile, not independently verified
#97	Mobile-AgentAlibaba Tongyi Lab (X-PLUG)	74	Estimated	Arena estimate — derived from capability profile, not independently verified
#98	FlowiseFlowiseAI	73.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#99	KnowAgentzjunlp (Zhejiang University)	73.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#100	MoTFudan (Li, Qiu)	73.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#101	AppAgentTencent / mnotgod96	73.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#102	Bee ComputerBee (acquired by Amazon 2026)	73	Estimated	Arena estimate — derived from capability profile, not independently verified
#103	ReMeModelScope (Alibaba)	73	Estimated	Arena estimate — derived from capability profile, not independently verified
#104	RMMGoogle / UCSB (2025)	72.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#105	ChatDBTsinghua University (Hu et al.)	72.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#106	MemoChatUniversity of Warwick / Alibaba	72.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#107	AriGraphAIRI Institute / Moscow	72.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#108	Personal AIPersonal AI	72.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#109	Generative AgentsStanford / Google	72	Estimated	Arena estimate — derived from capability profile, not independently verified
#110	Second MeMindverse (Shang, Li, et al.)	71.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#111	SynapseNTU / Salesforce (Zheng et al.)	71.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#112	MemaryKingjulio8238	71.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#113	Character AICharacter.AI (Google investment)	71.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#114	JARVIS-1CraftJarvis	71.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#115	Tab AITab (Avi Schiffmann)	71.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#116	EM-LLMem-llm (academic consortium)	71.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#117	Lyzr CognisLyzr AI	71.5	Self-Reported	Lyzr internal evaluation — avg of 4 LoCoMo tasks vs Zep, OpenAI, Mem0
#118	ZepZep AI	71.3	Self-Reported	Disputed: Zep claims 75.14, Mem0 independent reeval reports 65.99. Shown value from original Arena entry
#119	Neo4j AuraDBNeo4j Inc.	70.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#120	Friend AIFriend	70	Estimated	Arena estimate — derived from capability profile, not independently verified
#121	Charlie MnemonicGoodAI	69.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#122	HybridAGISynaLinks	69.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#123	Nomic AtlasNomic AI Inc.	69.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#124	Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy)	68.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#125	Titanslucidrains (community) / paper by Google Research	68.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#126	GleanGlean Technologies	68	Estimated	Arena estimate — derived from capability profile, not independently verified
#127	Mem0Mem0	66.9	Self-Reported	Mem0 paper Table 2 — base Mem0 LoCoMo J-score (66.88 rounded)
#128	MemoryScopeAlibaba ModelScope	66.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#129	Claude ProjectsAnthropic	66.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#130	LarimarIBM Research	66.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#131	Gemini MemoryGoogle	65.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#132	LongMemUCSB / Microsoft Research	65.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#133	MemformerUC Santa Barbara / Amazon (Wu, Lan, Liu, et al.)	65.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#134	kNN-LMStanford / Facebook AI Research (Khandelwal et al.)	64.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#135	LettaLetta (formerly MemGPT)	64.2	Estimated	Arena estimate — no published benchmark data in Letta repo or paper
#136	Haystack Memorydeepset	62.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#137	MemoryBankInstitute of Software, Chinese Academy of Sciences	62.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#138	R3MemHKUST (2025)	62.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#139	LangMemLangChain	61	Self-Reported	LangMem launch post
#140	Copilot MemoryMicrosoft	60.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#141	ChatGPT MemoryOpenAI	60.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#142	Couchbase VectorCouchbase Inc.	60	Estimated	Arena estimate — derived from capability profile, not independently verified
#143	Redis VectorRedis Ltd.	59.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#144	MemoryBankHarbin Institute of Technology / SenseTime	59.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#145	Notion AINotion Labs	58.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#146	KDB AIKX Systems	58.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#147	RagieRagie Inc.	57.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#148	Saner AISaner.AI	57.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#149	AnythingLLMMintplex Labs	57.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#150	MemGPT ClassicBerkeley / Letta	56.8	Self-Reported	Reproduction
#151	LlamaIndex MemoryLlamaIndex	56.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#152	Mem AIMem Labs	56.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#153	MongoDB Atlas VectorMongoDB Inc.	55.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#154	QuivrQuivrHQ	55.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#155	MemoroMIT Media Lab	55.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#156	Heyday AIHeyday (shut down 2025)	54.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#157	Sana AISana Labs	54.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#158	MemoriGibsonAI	54.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#159	Mnemosyneindependent	54.5	Self-Reported	arXiv:2510.08601 Table 2 — Overall J-score, Llama3.1-8B-Instruct; SingleHop 62.78, MultiHop 49.53, OpenDomain 60.42, Temporal 53.03
#160	MnemosyneJohns Hopkins / independent (2025)	54.5	Self-Reported	arXiv:2510.08601 Table 2 — Same paper as mnemosyne — paper is titled 'for Edge-Based LLMs'; edge is the main design target
#161	EpsillaEpsilla Inc. (YC S23)	53.3	Estimated	Arena estimate — derived from capability profile, not independently verified