Back to Arena

Benchmark Catalog

All benchmarks used to evaluate AI memory systems, grouped by category. Click any benchmark to see detailed information and system rankings.

Long-Context Retrieval

5 benchmarks

Multi-Turn Recall

2 benchmarks

Cross-Session Memory

1 benchmark

Multi-Hop QA

3 benchmarks

Agent Task Memory

1 benchmark

Personalization

1 benchmark

Factuality / Grounding

1 benchmark