兰德-人工智能评估和治理的最佳实践-欧盟通用人工智能模型评估标准工作组提案(英)
PATRICIA PASKOV, LISA SODER, EVERETT SMITHToward Best Practices for AI Evaluation and GovernanceA Proposal for a European Union General-Purpose AI Model Evaluation Standards Task ForceExpert InsightsPERSPECTIVE ON A TIMELY POLICY ISSUEJune 2025For more information on this publication, visit www.rand.org/t/PEA3624-1.About RANDRAND is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. To learn more about RAND, visit www.rand.org.Research IntegrityOur mission to help improve policy and decisionmaking through research and analysis is enabled through our core values of quality and objectivity and our unwavering commitment to the highest level of integrity and ethical behavior. To help ensure our research and analysis are rigorous, objective, and nonpartisan, we subject our research publications to a robust and exacting quality-assurance process; avoid both the appearance and reality of financial and other conflicts of interest through staff training, project screening, and a policy of mandatory disclosure; and pursue transparency in our research engagements through our commitment to the open publication of our research findings and recommendations, disclosure of the source of funding of published research, and policies to ensure intellectual independence. For more information, visit www.rand.org/about/research-integrity.RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors.Published by the RAND Corporation, Santa Monica, Calif.© 2025 RAND Corporation is a registered trademark.Limited Print and Electronic Distribution RightsThis publication and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to its webpage on rand.org is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research products for commercial purposes. For information on reprint and reuse permissions, visit www.rand.org/about/publishing/permissions.PE-A3624-1 iii About This Paper A promising way of identifying and mitigating the systemic risks posed by artificial intelligence (AI) development and deployment is general-purpose AI (GPAI) evaluations. Although GPAI evaluations play an increasingly central role in institutional decisionmaking and policymaking, including the European Union (EU) AI Act’s mandate to conduct evaluations on GPAI models presenting systemic risk, no standards exist to promote GPAI evaluations’ quality. To strengthen GPAI evaluations in the EU, the first and only jurisdiction that mandates GPAI evaluations, we outline four desiderata for evaluations: internal validity, external validity, reproducibility, and portability.1
兰德-人工智能评估和治理的最佳实践-欧盟通用人工智能模型评估标准工作组提案(英),点击即可下载。报告格式为PDF,大小0.52M,页数40页,欢迎下载。
