AILuminate v1.0: AI Risk & Reliability Benchmark (MLCommons)

Co-author on the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability, developed through an open, cross-sector process at MLCommons.

AILuminate v1.0 is the first industry-standard benchmark for evaluating AI safety and reliability, developed by the MLCommons AI Risk & Reliability working group with participation from researchers and engineers across academia, industry, and civil society. The benchmark evaluates AI systems across twelve hazard categories using an extensive prompt dataset, a tuned ensemble of safety evaluation models, and a five-tier grading scale. Marisa Ferrara Boston contributed as a member of the working group that designed the assessment standard and evaluation methodology. The paper has been published on arXiv and the benchmark is publicly available through MLCommons.

Read the paper (arXiv) | AILuminate benchmark site

Our other articles

All articles