Back to Arena
PaperQA2
by FutureHouse
System Card
OrganizationFutureHouse
Released2024-01
Architectureagentic-workflow / Agentic RAG for scientific papers (3-phase)
DetailsThree-phase agent (search -> evidence gathering with embedding+LLM re-scoring -> answer). Metadata-aware embeddings, automatic paper metadata + citation/retraction checks, multimodal tables/figures/equations.
Parameters—
Domainrag-retrieval
Open SourceYes
PaperView Paper
WebsiteVisit
CodeRepository
scienceagentic-ragcitationsmultimodal
Capability Profile
Benchmark Scores
5 of 14 benchmarksMulti-Turn Recall0/2
LoCoMo
no dataMemoryBank
no dataCross-Session Memory0/1
LongMemEval
no dataMulti-Hop QA2/3
Agent Task Memory0/1
AgentBench-Mem
no dataPersonalization0/1
PerLTQA
no dataFactuality / Grounding1/1
Sources:PaperQA2 paper (arXiv:2409.13740); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)PaperQA2 paper (arXiv:2409.13740); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)PaperQA2 paper (arXiv:2409.13740); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)PaperQA2 paper (arXiv:2409.13740); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)PaperQA2 paper (arXiv:2409.13740); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)