Back to Arena
Reflexion
by Northeastern / MIT / Princeton (Shinn et al.)
System Card
OrganizationNortheastern / MIT / Princeton (Shinn et al.)
Released2023-03
Architectureagentic-workflow / Verbal reinforcement via episodic reflection buffer
DetailsAgents verbally reflect on task feedback signals, maintaining their own reflective text in an episodic memory buffer to induce better decisions in subsequent trials. Avoids weight updates by using language as a policy encoding.
Parameters—
Domainagent-memoryepisodic-sessionlifelong-learning
Open SourceYes
PaperView Paper
CodeRepository
verbal-rlself-reflectionepisodic-bufferneurips-2023
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall2/2
Cross-Session Memory1/1
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding0/1
RAGAS
no dataSources:arXiv:2303.11366 Figure 4c / Table 5 — CoT+Reflexion with GPT-4 and GOLD context (not retrieval). Reading comprehension setting.Reflexion paper (arXiv:2303.11366); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)Reflexion paper (arXiv:2303.11366); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)Reflexion paper (arXiv:2303.11366); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)Reflexion paper (arXiv:2303.11366); evaluated on MemoryBank: Enhancing LLMs with Long-Term Memory (Sun Yat-sen University, 2305)Reflexion paper (arXiv:2303.11366); evaluated on BABILong: Testing the Limits of LLMs with Long-Context Reasoning-in-a-Haystack (AIRI, 2406)