BrowserGym

by ServiceNow Research

System Card

OrganizationServiceNow Research

Released2024-02

Architectureagentic-workflow / Standardized Gym env for web agents (memory evaluation)

DetailsGym-style framework for web-agent research supporting MiniWoB, WebArena, WorkArena, VisualWebArena. Standard API for state/action/reward, memory across steps, and multi-benchmark integration.

Parameters—

Domainagent-memory

Open SourceYes

PaperView Paper

CodeRepository

servicenowgymwebarenabenchmark-harness

Capability Profile

Benchmark Scores

5 of 14 benchmarks

Data Transparency:5 estimated

Long-Context Retrieval

0/5

RULER

no data

NIAH

no data

LooGLE

no data

LongBench

no data

∞Bench

no data

Multi-Turn Recall

1/2

LoCoMo

80.388pEstimated

MemoryBank

no data

Cross-Session Memory

1/1

LongMemEval

80.784pEstimated

Multi-Hop QA

2/3

BABILong

no data

MultiHop-RAG

7579pEstimated

HotpotQA

75.372pEstimated

Agent Task Memory

1/1

AgentBench-Mem

7226pEstimated

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

0/1

RAGAS

no data

Sources:Arena estimate — derived from capability profile, not independently verified