Humanity's Last Exam score in 2025?
9
Ṁ15k
2026

Invalid contract

This market will resolve to the highest accuracy score (as a percentage) achieved by any AI model on the full, multi-modal Humanity's Last Exam at or before December 31, 2025, as reported on the official Scale AI leaderboard (https://scale.com/leaderboard/humanitys_last_exam) or other credible sources.

Background

Humanity's Last Exam is a challenging AI benchmark designed to test the limits of AI knowledge at the frontiers of human expertise. The exam consists of 3,000 questions across over 100 subjects, contributed by experts from over 500 institutions worldwide. As of early 2025, top-performing models include:

  • o1 (December 2024): 8.81% accuracy, 92.79% calibration error

  • Claude 3.7 Sonnet Thinking (February 2025): 8.93% accuracy

  • Gemini 2.0 Flash Thinking (January 2025): 7.22% accuracy, 90.58% calibration error

Other models like GPT-4o and Grok-2 have significantly lower accuracy scores, typically below 5%. The exam highlights the gap between current AI capabilities and expert-level human knowledge, with most models answering fewer than 10% of the questions correctly.

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules