Create systematic evaluation methods for AI-generated content quality.
Difficulty
Skill level required
Published
Creation date
AI Tool
Recommended platform
Design an evaluation framework for: [YOUR AI APPLICATION]
**Context:**
- Output type: [text/code/structured data/etc.]
- Quality dimensions: [accuracy, helpfulness, safety, style]
- Stakeholders: [who defines "good"]
Create a comprehensive framework:
1. **Evaluation Dimensions**
- Define 5-7 quality dimensions
- Rubric for each (1-5 scale with descriptions)
- Weighting by importance
2. **Automated Metrics**
- Programmatic checks (format, length, keywords)
- Reference-based metrics (BLEU, ROUGE, etc.)
- Model-based evaluation (LLM as judge)
- Consistency checks
3. **Human Evaluation Protocol**
- Evaluator selection criteria
- Task instructions
- Annotation interface design
- Inter-rater reliability targets
4. **Test Set Design**
- Coverage across use cases
- Edge cases and adversarial inputs
- Golden answer creation
- Periodic refresh strategy
5. **Reporting & Analysis**
- Score aggregation methods
- Trend visualization
- Failure analysis workflow
- Improvement prioritization
6. **Integration into Development**
- Pre-deployment gates
- Continuous monitoring
- Regression alerts
- Feedback to prompt improvement
Output as an actionable evaluation plan with templates.Teams needing systematic AI quality measurement.
Other prompts in the AI Architecture category you might find useful.