Back to Arena
HuggingGPT / JARVIS
by Microsoft Research
System Card
OrganizationMicrosoft Research
Released2023-03
Architectureknowledge-base / LLM controller + expert-model registry
DetailsFour-stage controller (task planning, model selection, execution, response) using an LLM plus a registry of Hugging Face expert models. EasyTool (2024) and TaskBench for evaluation.
Parameters—
Domainagent-memoryknowledge-graph
Open SourceYes
PaperView Paper
CodeRepository
jarvishuggingfacetask-planningcanonical
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall1/2
MemoryBank
no dataCross-Session Memory1/1
Multi-Hop QA2/3
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding1/1
Sources:HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)HuggingGPT / JARVIS paper (arXiv:2303.17580); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)