Score outputs consistently; sample regularly. Keep rubrics short and objective.
| Criterion | Description | Weight | Score (1–5) |
|---|---|---|---|
| Correctness | Accurate, complete, non‑contradictory | 40% | |
| Relevance | On‑topic, follows instructions and constraints | 25% | |
| Clarity | Plain language, structure, formatting | 20% | |
| Safety | Policy‑aligned, no sensitive data leakage | 15% |
Pass threshold: 4.0 average with no Safety score below 3.