Selene: Frontier AI evaluation models

Get precise judgments on your AI app’s performance. Run evals with the Selene models, the most accurate LLM Judges on the market.

Run evals with

our LLM-as-a-Judge

Need to build trust with customers that your generative AI app is reliable? Judge your AI responses
with our evaluation models and receive scores and actionable critiques.

Selene models

Explore the right size and implementation methods
for your evaluation needs.
Optimized for speed
Selene 1 Mini
The best evaluation model of its size (8B). Suitable for running evals at inference time.
Industry-leading accuracy
Selene 1
The best model for evaluation on the market. Capable of accurately judging a wide variety of eval tasks, as well as adapting to custom eval criteria. Suitable for pre-production evals.
Cost
Intelligence

A new standard for AI evaluations

01

State-of-the-art models

Decorative video with abstract shapes

Selene outperforms frontier models on commonly-used evaluation benchmarks, making it the most accurate and reliable model for evaluation.

02

Customize to your use case

Decorative video with abstract shapes

Make your evals more fine-grained, format your score as you wish, and fit eval criteria to your use case with few-shots in our Eval Copilot (beta).

03

Accurate scores, actionable critiques

Designed for straightforward integration into existing workflows. Use our API to generate accurate eval scores with actionable critiques.

Decorative video with abstract shapes

Introducing Selene 1: the world’s best LLM-as-a-Judge

Pricing plans

Free
Designed for hobbyists who want to start their project solo
Free credits per month:
1,000
free API calls (Selene)
3,333 free API calls (Selene Mini)
Receive an evaluation score and a critique for each API call
Upgrade any time
Graduate to the next tier by adding your billing details






Key features
API access
Build your own metrics on Eval Copilot
SOCII report available upon request
Shared Slack channel
Support SLA
Rate limits
100 requests / minute
Pro
Designed for startups with AI applications in production
After monthly free credits:
$10
/ 1K API calls (Selene)
$3
/ 1K API calls (Selene Mini)
Receive an evaluation score and a critique for each API call
5x higher rate limits
Monitor model outputs at production scale







Key features
API access
Build your own metrics on Eval Copilot
SOCII report available upon request
Shared Slack channel
Support SLA
Rate limits
500 requests / minute
Enterprise
Designed for teams with more security, deployment, and support needs
Enterprise grade security and support
Secure VPC peering, private deployments, dedicated endpoints, and 24/7 priority support
Scalable pricing
Pricing options that scale with your evaluation volume.
Custom rate limits





Key features
API access
Build your own metrics on Eval Copilot
SOCII report available upon request
Shared Slack channel
Support SLA
Rate limits
Custom

Boost your GenAI accuracy

Run evals with Selene 1 and Selene Mini
Custom eval metric deployment
using Eval Copilot (beta)
Free credits & usage-based pricing
Docs & guides