LongMemEval

Name: LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Creator: Salesforce AI Research
Keywords: cross-session-memory, agent-memory, episodic-session

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Benchmark Metadata

PublisherSalesforce AI Research

VenuearXiv preprint

Evaluation Typeautomatic

Dimensions5

Test Prompts500

ScoringHigher is better

Update Frequencyannual

PaperView Paper

LeaderboardView Leaderboard

What It Measures

Information extraction across sessions
Multi-session reasoning
Knowledge update tracking
Temporal reasoning
Abstention on missing facts

What It Does Not Measure

Single-turn factual recall
Latency
Token-cost efficiency
Open-ended generation quality

All Systems Evaluated(173 systems)

19 self-reported154 estimated

Rank	System	Score	Provenance	Source
#1	MemPalaceBen Sigman / Milla Jovovich (independent open-source)	96.6	Self-Reported	Self-reported R@5 retrieval recall, raw ChromaDB mode (no LLM). Best config (Hybrid v4 + Haiku rerank) achieves 100% R@5. Held-out 450q test: 98.4%.
#2	Backboard IOBackboard.io	93.4	Self-Reported	github.com/Backboard-io/Backboard-longmemEval-results — 467/500 on LongMemEval s_cleaned (~115k tokens), GPT-4.1, independent eval by NewMathData
#3	Lyzr CognisLyzr AI	90.6	Self-Reported	Lyzr internal evaluation with GPT-4.1 — avg of 6 LongMemEval dimensions
#4	VoyagerNVIDIA / Caltech / UT Austin / Stanford / ASU / UW (Wang et al.)	87.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#5	Pickle AISoul Computer (YC-backed)	86.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#6	xmemoryxmemory Inc.	86.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#7	ArcMemoUC Berkeley / Stanford (Ho et al.)	85.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#8	SuperAGITransformerOptimus	85.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#9	ReplikaLuka, Inc.	84.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#10	Swarmskyegomez / Swarms Corp	84	Estimated	Arena estimate — derived from capability profile, not independently verified
#11	MemR32025 (December submission)	83.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#12	OS-Copilot / FRIDAYShanghai AI Lab / MMLab (Wu et al.)	83.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#13	A-MEMAGI Research / Rutgers	83.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#14	HippoRAG 2OSU NLP Group	83	Estimated	Arena estimate — derived from capability profile, not independently verified
#15	Talkie AIMiniMax	82.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#16	CognigyCognigy GmbH (acquired by NICE, July 2025)	82.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#17	CrewAI EnterpriseCrewAI Inc.	82.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#18	MempZhejiang University (Fang et al.)	82.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#19	memUNevaMind-AI	82	Self-Reported	third-party launch coverage (X/Twitter) — LongMemEval-S; weaker sourcing, not on official page
#20	Bee ComputerBee (acquired by Amazon 2026)	81.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#21	SupermemorySupermemory	81.6	Self-Reported	supermemory.ai/research — 81.6% overall accuracy (GPT-4o configuration)
#22	MoTFudan University (Li & Qiu)	81.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#23	MIRIXMIRIX AI (Wang, Chen)	81.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#24	AutoWebGLMTHUDM	81.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#25	ExpeLTsinghua University (Zhao et al.)	81	Estimated	Arena estimate — derived from capability profile, not independently verified
#26	BabyAGIYohei Nakajima	80.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#27	BrowserGymServiceNow Research	80.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#28	Suki AISuki (formerly Robin AI)	80.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#29	AgentVerseOpenBMB (Tsinghua)	80.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#30	Lindy AILindy AI	80.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#31	LagentInternLM (Shanghai AI Lab)	80.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#32	Nabla CopilotNabla	80.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#33	LangflowLangflow-ai (DataStax)	80.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#34	Mobile-AgentAlibaba Tongyi Lab (X-PLUG)	80.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#35	CradleBAAI-Agents	80	Estimated	Arena estimate — derived from capability profile, not independently verified
#36	HebbiaHebbia, Inc.	79.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#37	JARVIS-1CraftJarvis	79.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#38	VoiceflowVoiceflow Inc.	79.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#39	Kore AIKore.ai Inc.	79.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#40	Generative AgentsStanford University / Google Research	79.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#41	Onyxonyx-dot-app	79.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#42	WebVoyagerMinorJerry et al.	79.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#43	AGiXTJosh-XT	79.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#44	Bishengdataelement	79.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#45	CAMELCAMEL-AI.org	79.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#46	Nuance DAXNuance Communications (Microsoft)	79.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#47	SynapseNanyang Technological University (Zheng et al.)	79.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#48	ChatDev 2.0OpenBMB	79.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#49	ReflexionNortheastern / MIT / Princeton (Shinn et al.)	79.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#50	Tab AITab (Avi Schiffmann)	79.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#51	Self-RAGUniversity of Washington / Allen AI (Asai et al.)	79.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#52	AutoGPT PlatformSignificant Gravitas	79	Estimated	Arena estimate — derived from capability profile, not independently verified
#53	Stack AIStack AI Inc. (YC W23)	79	Estimated	Arena estimate — derived from capability profile, not independently verified
#54	Galileo AIGalileo Technologies Inc.	78.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#55	VectorShiftVectorShift Inc. (YC S23)	78.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#56	Agent Workflow MemoryCMU (Wang, Mao, Fried, Neubig)	78.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#57	HiMemZhu et al. (JD.com, 2026)	78.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#58	Limitless PendantLimitless AI (acquired by Meta Dec 2025)	78.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#59	Athina AIAthina AI (YC W23)	78.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#60	HoneyHiveHoneyHive Inc.	78.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#61	SCMBeihang / NLPR (Wang et al.)	78.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#62	GPTeam101dotxyz	77.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#63	AutoGen Core MemoryMicrosoft	77.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#64	MemOSMemTensor (Li, Zhang, et al.)	77.8	Self-Reported	arXiv:2507.03724 — MemOS-1031 average, Table 3 — Average across LongMemEval categories; outperforms Memobase 72.4%
#65	DB-GPTeosphoros-ai	77.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#66	MetaGPTDeepWisdom / geekan	77.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#67	Qwen-AgentQwenLM (Alibaba)	77.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#68	Character AICharacter.AI (Google investment)	77.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#69	FastGPTlabring	77.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#70	Vellum AIVellum AI Inc. (YC W23)	77.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#71	D-MemYou et al. (2025)	77.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#72	SID AISID (YC)	77.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#73	Friend AIFriend	77	Estimated	Arena estimate — derived from capability profile, not independently verified
#74	DifyLangGenius	76.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#75	LangGraphLangChain	76.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#76	CrewAICrewAI Inc. (Joao Moura)	76.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#77	Think-in-MemoryAnt Group / Alibaba (Liu et al.)	76.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#78	RecallMCisco Research / independent (Kynoch & Latapie)	76.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#79	Nomi AIGlimpse AI, Inc.	76.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#80	GAMVectorSpaceLab (BAAI-related)	76.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#81	LangSmith LangGraph CloudLangChain Inc.	76.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#82	Open InterpreterOpenInterpreter	76.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#83	MultiOnMultiOn (now AGI Inc.)	76.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#84	Personal AIPersonal AI	75.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#85	Pi InflectionInflection AI	75.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#86	MemaryKingjulio8238	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#87	AutoGen StudioMicrosoft Research	75.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#88	BotpressBotpress Inc.	75.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#89	FlowiseFlowiseAI	75.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#90	MemoryLLMUCSD / Apple (Wang et al.)	75.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#91	AbridgeAbridge	75.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#92	HybridAGISynaLinks	75.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#93	Ontotext GraphDBOntotext / Graphwise (merged with Semantic Web Company, 2025)	75.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#94	Dust ttDust (formerly XP1)	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#95	Titanslucidrains (community) / paper by Google Research	75	Estimated	Arena estimate — derived from capability profile, not independently verified
#96	Maxim AIMaxim AI Inc.	74.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#97	Second MeMindverse (Shang, Li, et al.)	74.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#98	ParadotWithFeeling.AI	74.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#99	KnowAgentzjunlp (Zhejiang University)	74.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#100	NemoriNemori AI (independent)	74.6	Self-Reported	arXiv:2508.03341 results table — LongMemEval-S accuracy with gpt-4.1-mini; uses 95-96% less context than full-context baseline
#101	MoTFudan (Li, Qiu)	74.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#102	Plaud NotePLAUD	74.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#103	AriGraphAIRI Institute / Moscow	74.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#104	Granola AIGranola	74.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#105	SynapseNTU / Salesforce (Zheng et al.)	74	Estimated	Arena estimate — derived from capability profile, not independently verified
#106	Charlie MnemonicGoodAI	73.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#107	LightRAGHKUDS (HKU Data Intelligence Lab)	73.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#108	Nano GraphRAGgusye1234	73.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#109	PathRAGBUPT-GAMMA	73.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#110	GleanGlean Technologies	73.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#111	Memoripycaspianmoon	73.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#112	HippoRAGOSU NLP Group (Ohio State University)	73.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#113	AppAgentTencent / mnotgod96	73.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#114	StardogStardog Union Inc.	73.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#115	Neo4j LLM Graph BuilderNeo4j Labs	73.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#116	GraphitiZep AI	73.2	Self-Reported	Zep / Graphiti paper
#117	MCP Memory ServerAnthropic / Model Context Protocol	73.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#118	MiniRAGHKUDS	73.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#119	ChatDBTsinghua University (Hu et al.)	72.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#120	R2RSciPhi-AI	72.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#121	Neo4j AuraDBNeo4j Inc.	72.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#122	GraphRAGMicrosoft	71.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#123	KAGOpenSPG / Ant Group	71.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#124	DiffbotDiffbot Inc.	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#125	RAGFlowInfiniFlow	71.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#126	ZepZep AI	71.2	Self-Reported	Zep paper (arXiv:2501.13956) — 71.2% on LongMemEval
#127	HuggingGPT / JARVISMicrosoft Research	71.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#128	Generative AgentsStanford / Google	71.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#129	MemoChatUniversity of Warwick / Alibaba	71.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#130	GraphRAG-SDKFalkorDB	70.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#131	AllegroGraphFranz Inc.	70.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#132	Memorizing TransformerGoogle Research (Wu, Rabe, Hutchins, Szegedy)	70.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#133	KindroidKindroid	70.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#134	RMMGoogle / UCSB (2025)	70.4	Self-Reported	arXiv:2503.08026 Table 1 (ACL 2025) — RMM with GTE retriever; baseline GTE RAG 63.6%. >10% improvement over no-memory baseline
#135	LarimarIBM Research	68.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#136	kNN-LMStanford / Facebook AI Research (Khandelwal et al.)	68.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#137	HEMAindependent (Ahn et al.)	66.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#138	LongMemUCSB / Microsoft Research	65.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#139	Nomic AtlasNomic AI Inc.	64.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#140	EM-LLMem-llm (academic consortium)	64.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#141	Claude ProjectsAnthropic	64	Self-Reported	Third-party reproduction
#142	MemoryScopeAlibaba ModelScope	63.7	Self-Reported	MemoryScope evals
#143	Haystack Memorydeepset	63.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#144	WeaviateWeaviate	63.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#145	Gemini MemoryGoogle	62.3	Self-Reported	Third-party reproduction
#146	ChatGPT MemoryOpenAI	61.5	Self-Reported	Third-party reproduction
#147	MnemosyneJohns Hopkins / independent (2025)	61.4	Estimated	Arena estimate — derived from capability profile, not independently verified
#148	MemoroMIT Media Lab	60.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#149	QuivrQuivrHQ	60.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#150	LettaLetta (formerly MemGPT)	60.4	Estimated	Arena estimate — no published benchmark data in Letta repo or paper
#151	MemoryBankInstitute of Software, Chinese Academy of Sciences	60.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#152	Mnemosyneindependent	59.9	Estimated	Arena estimate — derived from capability profile, not independently verified
#153	Copilot MemoryMicrosoft	59.7	Self-Reported	Third-party reproduction
#154	Sana AISana Labs	59.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#155	Heyday AIHeyday (shut down 2025)	59.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#156	AnythingLLMMintplex Labs	59.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#157	REALMGoogle Research (Guu et al.)	59.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#158	Saner AISaner.AI	59.1	Estimated	Arena estimate — derived from capability profile, not independently verified
#159	Redis VectorRedis Ltd.	58.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#160	LangMemLangChain	58.3	Self-Reported	LangMem launch post
#161	EpsillaEpsilla Inc. (YC S23)	58.3	Estimated	Arena estimate — derived from capability profile, not independently verified
#162	Couchbase VectorCouchbase Inc.	58.2	Estimated	Arena estimate — derived from capability profile, not independently verified
#163	RagieRagie Inc.	57.7	Estimated	Arena estimate — derived from capability profile, not independently verified
#164	MemoryBankHarbin Institute of Technology / SenseTime	57.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#165	MongoDB Atlas VectorMongoDB Inc.	57	Estimated	Arena estimate — derived from capability profile, not independently verified
#166	LlamaIndex MemoryLlamaIndex	56.8	Self-Reported	LlamaIndex evals
#167	KDB AIKX Systems	56.8	Estimated	Arena estimate — derived from capability profile, not independently verified
#168	AtlasMeta AI FAIR (Izacard et al.)	56.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#169	Notion AINotion Labs	55.6	Estimated	Arena estimate — derived from capability profile, not independently verified
#170	Mem AIMem Labs	55.5	Estimated	Arena estimate — derived from capability profile, not independently verified
#171	MemoriGibsonAI	54.8	Self-Reported	Memori internal eval
#172	MemGPT ClassicBerkeley / Letta	52.4	Self-Reported	MemGPT paper
#173	Memory³Institute for Advanced Algorithms Research Shanghai / Peking University	51.3	Estimated	Arena estimate — derived from capability profile, not independently verified