RAG Evaluation Checklist (Triad)¶
- [ ] Context relevance (retrieved chunks topically relevant)
- [ ] Groundedness (answer supported by retrieved text)
- [ ] Answer relevance (addresses the query)
- [ ] Latency: p50, p95 (ms); Token counts (prompt/response)
- [ ] Citations contain stable doc IDs + spans
- [ ] Regression set: 10–20 fixed questions with expected sources