Back to Arena

PaperQA2

by FutureHouse

System Card

OrganizationFutureHouse
Released2024-01
Architectureagentic-workflow / Agentic RAG for scientific papers (3-phase)
DetailsThree-phase agent (search -> evidence gathering with embedding+LLM re-scoring -> answer). Metadata-aware embeddings, automatic paper metadata + citation/retraction checks, multimodal tables/figures/equations.
Parameters
Domainrag-retrieval
Open SourceYes
WebsiteVisit
scienceagentic-ragcitationsmultimodal

Capability Profile

Benchmark Scores

5 of 14 benchmarks
Data Transparency:5 estimated
Long-Context Retrieval
2/5
RULER
69.131pEstimated
NIAH
no data
LooGLE
no data
LongBench
603pEstimated
∞Bench
no data
Multi-Turn Recall
0/2
LoCoMo
no data
MemoryBank
no data
Cross-Session Memory
0/1
LongMemEval
no data
Multi-Hop QA
2/3
BABILong
no data
MultiHop-RAG
72.556pEstimated
HotpotQA
72.961pEstimated
Agent Task Memory
0/1
AgentBench-Mem
no data
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
1/1
RAGAS
69.148pEstimated