# Measure AI Answer Quality with a Small Evaluation Set
## 1. Collect real prompts
Pick support tickets, search queries, or editor requests that represent normal work. Include easy, hard, and ambiguous examples.
## 2. Write expected traits
Do not require one exact answer. Score whether the answer is grounded, complete, concise, and honest about missing information.
## 3. Compare every change
Run the same prompts before and after prompt edits, model changes, or retrieval tuning. Keep the results in a simple table.
## Checklist
- Confirm the input data is safe to process.
- Keep a human review path for uncertain results.
- Measure the workflow before adding more automation.
A small evaluation set will not prove perfection, but it prevents blind changes from quietly lowering quality.
Comments
0 comments
No approved comments are visible yet. New community replies may wait for moderation.