Blog

Identifying & auto-correcting agent failures: findings from τ-bench

Identifying & auto-correcting agent failures: findings from τ-bench

Nina

April 29, 2025

Introducing the Atla MCP Server: purpose-built LLM Judges now at your command

Introducing the Atla MCP Server: purpose-built LLM Judges now at your command

Atla team

April 22, 2025

Latest posts

LLM Judges as Reward Models

LLM Judges as Reward Models

Henry

October 31, 2024

Selecting a training objective for an AI evaluator (SFT vs. DPO vs. RPO)

Selecting a training objective for an AI evaluator (SFT vs. DPO vs. RPO)

Andrei

October 22, 2024

Evaluating GenAI applications with LLM‑as‑a‑judge

Evaluating GenAI applications with LLM‑as‑a‑judge

Kyle

October 8, 2024

“AI’s $600B Question” and AGI’s $34T Answer

“AI’s $600B Question” and AGI’s $34T Answer

Maurice

September 10, 2024

Scaling Alignment: Training AI Evaluators to Capture Human Preferences

Scaling Alignment: Training AI Evaluators to Capture Human Preferences

Maurice

July 11, 2024