Back to Arena
Backboard IO
by Backboard.io
System Card
OrganizationBackboard.io
Released2025-01
Architecturehybrid / Stateful Portable Memory Stack + Unified LLM API
DetailsBackboard provides a single coherent API unifying 17,000+ LLMs alongside a built-in memory stack, RAG, and web search that all share the same stateful context. The adaptive memory engine captures facts, preferences, and relationships automatically and surfaces relevant context during conversations, maintaining continuity across thousands of interactions without retraining or prompt-stuffing. Achieved 93.4% on LongMemEval and 90.1% on LoCoMo — the first platform to lead both major memory benchmarks.
Parameters—
Domainagent-memoryrag-retrievallifelong-learning
Open SourceNo
WebsiteVisit
memory-apiunified-llm-apibenchmark-sotastatefulportable-memoryagentic
Capability Profile
Benchmark Scores
6 of 14 benchmarksLong-Context Retrieval0/5
RULER
no dataNIAH
no dataLooGLE
no dataLongBench
no data∞Bench
no dataMulti-Turn Recall1/2
MemoryBank
no dataCross-Session Memory1/1
Multi-Hop QA3/3
Agent Task Memory1/1
Personalization0/1
PerLTQA
no dataFactuality / Grounding0/1
RAGAS
no dataSources:github.com/Backboard-io/Backboard-longmemEval-results — 467/500 on LongMemEval s_cleaned (~115k tokens), GPT-4.1, independent eval by NewMathDatagithub.com/Backboard-io/Backboard-Locomo-Benchmark — GPT-4.1 judge, temp=0.1; per-category: SingleHop 89.4, MultiHop 75.0, OpenDomain 91.2, Temporal 91.9Backboard IO vendor documentation; evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)Backboard IO vendor documentation; evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)Backboard IO vendor documentation; evaluated on AgentBench Memory Track (Tsinghua KEG, 2308)Backboard IO vendor documentation; evaluated on BABILong: Testing the Limits of LLMs with Long-Context Reasoning-in-a-Haystack (AIRI, 2406)