Back to Arena

Unstructured IO

by Unstructured Technologies Inc.

System Card

OrganizationUnstructured Technologies Inc.
Released2022-09
Architecturevector-rag / document preprocessing and ETL for RAG
DetailsUnstructured provides open-source and managed APIs for extracting, partitioning, and transforming unstructured content (PDFs, DOCX, HTML, images) into clean JSON chunks suitable for downstream embedding and vector storage. Uses computer vision (layout detection) and NLP for high-fidelity extraction from complex document layouts. Integrates natively with LangChain, LlamaIndex, and all major vector stores.
Parameters
Domainrag-retrieval
Open SourcePartial
WebsiteVisit
document-parsingETLPDF-extractionlayout-detectionpre-processing

Capability Profile

Benchmark Scores

5 of 14 benchmarks
Long-Context Retrieval
2/5
RULER
69.941p
NIAH
no data
LooGLE
no data
∞Bench
no data
Multi-Turn Recall
0/2
LoCoMo
no data
MemoryBank
no data
Cross-Session Memory
0/1
LongMemEval
no data
Multi-Hop QA
2/3
BABILong
no data
HotpotQA
59.710p
Agent Task Memory
0/1
AgentBench-Mem
no data
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
1/1
RAGAS
64.516p