Back to Arena
MoT
by Fudan (Li, Qiu)
System Card
OrganizationFudan (Li, Qiu)
Released2023-05
Architectureepisodic-buffer / Pre-thought high-confidence thoughts as memory
DetailsTwo-stage self-improvement: pre-thinks on unlabeled data, stores high-confidence chains-of-thought as memory, then recalls them at test time to guide reasoning. No parameter updates, no labeled data.
Parameters—
Domainagent-memoryepisodic-session
Open SourceYes
PaperView Paper
CodeRepository
emnlp-2023self-improvementcotunlabeled
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall2/2
Cross-Session Memory1/1
Multi-Hop QA2/3
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding0/1
RAGAS
no dataSources:MoT paper (arXiv:2305.05181); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)MoT paper (arXiv:2305.05181); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)MoT paper (arXiv:2305.05181); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)MoT paper (arXiv:2305.05181); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)MoT paper (arXiv:2305.05181); evaluated on MemoryBank: Enhancing LLMs with Long-Term Memory (Sun Yat-sen University, 2305)MoT paper (arXiv:2305.05181); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)