Back to Arena

Athina AI

by Athina AI (YC W23)

System Card

OrganizationAthina AI (YC W23)
Released2023-01
Architectureagentic-workflow / LLM evaluation and production monitoring
DetailsAthina AI is an evaluation and monitoring platform offering 50+ preset evaluations (hallucination, context relevance, groundedness, toxicity) from multiple providers (OpenAI, Ragas, Guardrails). It supports SOC-2-compliant VPC deployment for enterprise customers. Provides both pre-deployment testing and production monitoring with real-time alerting on regression. Raised $4.1M total.
Parameters
Domainrag-retrievalagent-memory
Open SourceNo
WebsiteVisit
eval-frameworkhallucination-detectionSOC2production-monitoring50-evals

Capability Profile

Benchmark Scores

6 of 14 benchmarks
Long-Context Retrieval
1/5
RULER
no data
NIAH
no data
LooGLE
no data
∞Bench
no data
Multi-Turn Recall
1/2
LoCoMo
77.268p
MemoryBank
no data
Cross-Session Memory
1/1
Multi-Hop QA
2/3
BABILong
no data
HotpotQA
77.788p
Agent Task Memory
1/1
Personalization
0/1
PerLTQA
no data
Factuality / Grounding
0/1
RAGAS
no data