Athina AI

by Athina AI (YC W23)

System Card

OrganizationAthina AI (YC W23)

Released2023-01

Architectureagentic-workflow / LLM evaluation and production monitoring

DetailsAthina AI is an evaluation and monitoring platform offering 50+ preset evaluations (hallucination, context relevance, groundedness, toxicity) from multiple providers (OpenAI, Ragas, Guardrails). It supports SOC-2-compliant VPC deployment for enterprise customers. Provides both pre-deployment testing and production monitoring with real-time alerting on regression. Raised $4.1M total.

Parameters—

Domainrag-retrievalagent-memory

Open SourceNo

WebsiteVisit

eval-frameworkhallucination-detectionSOC2production-monitoring50-evals

Capability Profile

Benchmark Scores

6 of 14 benchmarks

Data Transparency:6 estimated

Long-Context Retrieval

1/5

RULER

no data

NIAH

no data

LooGLE

no data

LongBench

603pEstimated

∞Bench

no data

Multi-Turn Recall

1/2

LoCoMo

77.268pEstimated

MemoryBank

no data

Cross-Session Memory

1/1

LongMemEval

78.365pEstimated

Multi-Hop QA

2/3

BABILong

no data

MultiHop-RAG

7579pEstimated

HotpotQA

77.789pEstimated

Agent Task Memory

1/1

AgentBench-Mem

7226pEstimated

Personalization

0/1

PerLTQA

no data

Factuality / Grounding

0/1

RAGAS

no data

Sources:Arena estimate — derived from capability profile, not independently verified