Back to Arena
OS-Copilot / FRIDAY
by Shanghai AI Lab / MMLab (Wu et al.)
System Card
OrganizationShanghai AI Lab / MMLab (Wu et al.)
Released2024-02
Architectureagentic-workflow / Skill library + long-term memory + tool discovery
DetailsGeneralist OS agent with a Configuration Tracker using dense retrieval over long-term memory to recall tools, user profiles, and working directory state. Self-improves via accumulated skills across Excel, PowerPoint, web, code, and multimedia applications.
Parameters—
Domainagent-memorylifelong-learning
Open SourceYes
PaperView Paper
WebsiteVisit
CodeRepository
gaia-benchmarkos-agentskill-libraryself-improve
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall2/2
Cross-Session Memory1/1
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding0/1
RAGAS
no dataSources:OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on BABILong: Testing the Limits of LLMs with Long-Context Reasoning-in-a-Haystack (AIRI, 2406)OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)OS-Copilot / FRIDAY paper (arXiv:2402.07456); evaluated on MemoryBank: Enhancing LLMs with Long-Term Memory (Sun Yat-sen University, 2305)