Back to Arena
Marker
by Datalab (datalab-to)
System Card
OrganizationDatalab (datalab-to)
Released2023-10
Architectureknowledge-base / Document-to-markdown/JSON pipeline
DetailsConverts PDF/image/DOCX/XLSX/PPTX/HTML/EPUB to markdown/JSON/chunks/HTML with table/equation/image extraction. Optional hybrid LLM boost for accuracy; GPU/CPU/MPS support.
Parameters—
Domainrag-retrieval
Open SourceYes
WebsiteVisit
CodeRepository
pdfmarkdownstructured-extractiondatalab
Capability Profile
Benchmark Scores
5 of 14 benchmarksMulti-Turn Recall0/2
LoCoMo
no dataMemoryBank
no dataCross-Session Memory0/1
LongMemEval
no dataMulti-Hop QA2/3
Agent Task Memory0/1
AgentBench-Mem
no dataPersonalization0/1
PerLTQA
no dataFactuality / Grounding1/1
Sources:Marker (datalab-to/marker); evaluated on HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (Stanford / CMU, 1809)Marker (datalab-to/marker); evaluated on LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding (Tsinghua KEG, 2308)Marker (datalab-to/marker); evaluated on MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries (HKUST, 2401)Marker (datalab-to/marker); evaluated on RAGAS: Automated Evaluation of Retrieval-Augmented Generation (Exploding Gradients, 2309)Marker (datalab-to/marker); evaluated on RULER: What's the Real Context Size of Your Long-Context Language Models (NVIDIA, 2404)