The definitive guide to evals.
New Posts
GPT-4o Model Card
“Evals are all you need”
Benchmarks
Benchmarks
SWE-Bench: An Overview
HumanEval
Under construction 🚧
Eval Playbooks
Model vs Application Evals
Prompting
RAG
Agents
Metrics
Tools
Patterns and Architectures