Back to Arena

Reflexion

by Northeastern / MIT / Princeton (Shinn et al.)

System Card

OrganizationNortheastern / MIT / Princeton (Shinn et al.)
Released2023-03
Architectureagentic-workflow / Verbal reinforcement via episodic reflection buffer
DetailsAgents verbally reflect on task feedback signals, maintaining their own reflective text in an episodic memory buffer to induce better decisions in subsequent trials. Avoids weight updates by using language as a policy encoding.
Parameters
Domainagent-memoryepisodic-sessionlifelong-learning
Open SourceYes
verbal-rlself-reflectionepisodic-bufferneurips-2023

Capability Profile

Benchmark Scores

6 of 14 benchmarks
Data Transparency:1 self-reported5 estimated
Long-Context Retrieval
0/5
RULER
no data
NIAH
no data
LooGLE
no data
LongBench
no data
∞Bench
no data
Multi-Turn Recall
2/2
LoCoMo
81.291pEstimated
MemoryBank
78.470pEstimated
Cross-Session Memory
1/1
LongMemEval
79.271pEstimated
Multi-Hop QA
2/3
BABILong
72.725pEstimated
MultiHop-RAG
no data
HotpotQA
8096pSelf-Reported
Agent Task Memory
1/1
AgentBench-Mem
7226pEstimated
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
0/1
RAGAS
no data