Generative AI Evaluation Essentials (IEEE AI Test 2025)

Tutorial at IEEE AI Test 2025 on deploying evaluations for generative AI systems with real-world rigor. Covering system context, measurement and monitoring, agent behavior testing, and evaluation frameworks. Co-taught with Heather Frase, PhD and Sarah Luger, PhD.

How do you test generative AI systems when the system, the users, and the goals are all changing? This tutorial at IEEE AI Test 2025 covered four areas practitioners need to get right: defining what success looks like post-deployment, building measurement and monitoring that captures real-world outcomes rather than just task metrics, testing agent behavior across personas and edge cases, and selecting evaluation methods that scale as the system evolves. The course also addressed data strategies for meaningful evaluations (problem definition, human-in-the-loop design, dataset lifecycle management) and AI assessment tool selection for compliance and risk. Co-taught with Heather Frase, PhD (verAITech) and Sarah Luger, PhD (iMerit). A field guide summarizing the key frameworks is available on request.

Our other articles

All articles