The definitive guide to evals.

New Posts

GPT-4o Model Card

“Evals are all you need”

Benchmarks

Benchmarks

SWE-Bench: An Overview

HumanEval

Under construction 🚧

Eval Playbooks

Model vs Application Evals

Prompting

RAG

Agents

Metrics

Tools

Patterns and Architectures