Back to Benchmarks
PerLTQA
PerLTQA: A Personal Long-Term Memory Question Answering Dataset
Benchmark Metadata
PublisherPolyU
VenuearXiv preprint
Evaluation Typeautomatic
Dimensions3
Test Prompts8,593
ScoringHigher is better
Update Frequencyannual
PaperView Paper
LeaderboardView Leaderboard
What It Measures
- Personal semantic-memory recall
- Personal episodic-memory recall
- Memory-grounded answer accuracy
What It Does Not Measure
- Generic factual QA
- Code or math reasoning
- Latency
All Systems Evaluated(22 systems)
| Rank | System | Score |
|---|---|---|
| #1 | Tab AITab (Avi Schiffmann) | 87.8 |
| #2 | ReplikaLuka, Inc. | 86.8 |
| #3 | Pi InflectionInflection AI | 85.7 |
| #4 | Limitless PendantLimitless AI (acquired by Meta Dec 2025) | 85.5 |
| #5 | Talkie AIMiniMax | 84.8 |
| #6 | Second MeMindverse (Shang, Li, et al.) | 84.6 |
| #7 | Friend AIFriend | 84 |
| #8 | Character AICharacter.AI (Google investment) | 82.8 |
| #9 | Bee ComputerBee (acquired by Amazon 2026) | 82.5 |
| #10 | Charlie MnemonicGoodAI | 81.9 |
| #11 | Pickle AISoul Computer (YC-backed) | 80.4 |
| #12 | ParadotWithFeeling.AI | 76.6 |
| #13 | Nomi AIGlimpse AI, Inc. | 75.7 |
| #14 | KindroidKindroid | 74.6 |
| #15 | Personal AIPersonal AI | 74 |
| #16 | memUNevaMind-AI | 73.2 |
| #17 | MemoryBankInstitute of Software, Chinese Academy of Sciences | 72.6 |
| #18 | Copilot MemoryMicrosoft | 67.9 |
| #19 | ChatGPT MemoryOpenAI | 67.7 |
| #20 | MnemosyneJohns Hopkins / independent (2025) | 67.7 |
| #21 | Gemini MemoryGoogle | 67.4 |
| #22 | Heyday AIHeyday (shut down 2025) | 62.9 |