Back to Arena
AppAgent
by Tencent / mnotgod96
System Card
OrganizationTencent / mnotgod96
Released2023-12
Architectureknowledge-base / UI documentation knowledge base from exploration
DetailsMultimodal LLM agent that explores mobile apps (or learns from demos), generating UI documentation stored as a knowledge base used for later task execution via touchscreen actions. Works with GPT-4V / Qwen-VL.
Parameters—
Domainagent-memoryknowledge-graph
Open SourceYes
PaperView Paper
CodeRepository
mobile-agentui-docgpt-4vexploration
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall1/2
MemoryBank
no dataCross-Session Memory1/1
Multi-Hop QA2/3
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding1/1
Sources:AppAgent paper (arXiv:2312.13771); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)AppAgent paper (arXiv:2312.13771); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)AppAgent paper (arXiv:2312.13771); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)AppAgent paper (arXiv:2312.13771); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)AppAgent paper (arXiv:2312.13771); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)AppAgent paper (arXiv:2312.13771); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)