Back to Arena

HuggingGPT / JARVIS

by Microsoft Research

System Card

OrganizationMicrosoft Research
Released2023-03
Architectureknowledge-base / LLM controller + expert-model registry
DetailsFour-stage controller (task planning, model selection, execution, response) using an LLM plus a registry of Hugging Face expert models. EasyTool (2024) and TaskBench for evaluation.
Parameters
Domainagent-memoryknowledge-graph
Open SourceYes
jarvishuggingfacetask-planningcanonical

Capability Profile

Benchmark Scores

6 of 14 benchmarks
Long-Context Retrieval
0/5
RULER
no data
NIAH
no data
LooGLE
no data
LongBench
no data
∞Bench
no data
Multi-Turn Recall
1/2
LoCoMo
7439p
MemoryBank
no data
Cross-Session Memory
1/1
Multi-Hop QA
2/3
BABILong
no data
HotpotQA
80.397p
Agent Task Memory
1/1
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
1/1
RAGAS
72.375p