Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting
AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers can...