Back to Arena
BrowserGym
by ServiceNow Research
System Card
OrganizationServiceNow Research
Released2024-02
Architectureagentic-workflow / Standardized Gym env for web agents (memory evaluation)
DetailsGym-style framework for web-agent research supporting MiniWoB, WebArena, WorkArena, VisualWebArena. Standard API for state/action/reward, memory across steps, and multi-benchmark integration.
Parameters—
Domainagent-memory
Open SourceYes
PaperView Paper
CodeRepository
servicenowgymwebarenabenchmark-harness
Capability Profile
Benchmark Scores
5 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall1/2
MemoryBank
no dataCross-Session Memory1/1
Multi-Hop QA2/3
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding0/1
RAGAS
no dataSources:BrowserGym paper (arXiv:2412.05467); evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)BrowserGym paper (arXiv:2412.05467); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)BrowserGym paper (arXiv:2412.05467); evaluated on LoCoMo: Long-Term Conversational Memory Benchmark (Snap Research, 2402)BrowserGym paper (arXiv:2412.05467); evaluated on LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory (Salesforce AI Research, 2410)BrowserGym paper (arXiv:2412.05467); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)