Will an AI get gold on any International Math Olympiad by the end of 2025?
➕
Plus
0
resolved Dec 9
Resolved
N/A
https://bounded-regret.ghost.io/ai-forecasting-one-year-in/ This is from June - great article on hypermind forecasts for AI progress, and how the progress on the MATH dataset 1 year in was far faster than predicted.
https://ai.facebook.com/blog/ai-math-theorem-proving/
Seems relevant https://aimoprize.com/
Retracted, possibly wrong, possibly embargo-breaking, online article saying that Deepmind systems had hit IMO silver level.
+20%
on
It's over https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
+30%
on
https://openai.com/index/learning-to-reason-with-llms/ Looks like you don't even need specific math fine-tuning to solve math competitions, you just need non-constant compute time for LLMs (So they spend more time on hard problems)
@AdamK OK, so who's benchmarking o3-mini against the 2024 IMO? We could have results within the week.
+7%
on

In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:

I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)

Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."

Eliezer spent less time revising his prediction, but said (earlier in the discussion):

My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.


Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.

Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b


Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.

Update 2024-06-21: Description formatting

Update 2024-07-25: Changed title from "by 2025" to "by the end of 2025" for clarity

This question is managed and resolved by Manifold.
Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ50 NO

did people forget that the initial cause of the market jump was AlphaProof, which hasn’t made any noise since July?

it's not surprising that it wouldn't make much noise over half of a year though. usually second iterations of google products like that don't come out that soon. also most likely alphaproof 2 will make noise around july of this year

@Bayesian the market in july probably had this priced in, but it doesn’t make sense to me that it spiked as much as it did because of o3

@fakebaechallenge Well, o3 provided another way that it could happen - perhaps AlphaProof does nothing but o4 is announced this year and it can get gold on IMO. That wasn’t as likely before

@dominic Well currently o1/o3 have trouble writing rigorous proofs.

@fakebaechallenge alphaproof was really good for some stuff like geometry that reasoning models are not that good at, and the reasoning models are more likely than not to be able to somewhat complement alphaproof-like stuff. so alphaproof+ + best reasoning model + lots of prestige from solving this

@nathanwei that is a good point, also the timed component adds some uncertainty (maybe labs prefer the glory of getting rly good IMO score than doing it in the allotted time limit, so market resolves NO, or something)

tbc i still think it will happen, it's like overdetermined, probably a couple labs will get Gold this year. that's a market i guess.

@Bayesian FYI there seems to be about 246 free mana available by arbitraging your limit order against a limit order at 75% in my 2026 market right now. Thanks to Floris for correcting this.

bought Ṁ100 YES

rStar-Math showing roughly equivalent performance to o1-mini on benchmarks with a 14b model: https://huggingface.co/papers/2501.04519

The jump to 85% is a pretty crazy vibeshift compared to a year ago

Has anyone systematically checked the performance of the new o1/pro models against the 2024 IMO? Or the 2024 IOI for that matter? There's no need to wait until July.

@AdamK OK, so who's benchmarking o3-mini against the 2024 IMO? We could have results within the week.

@AdamK Just checking for agreement on resolution criteria: I believe that this can only resolve on the actual 2025 IMO. So if you give o3-mini the 2024 IMO and it gets every question right, that will not count toward a YES resolution. Is that your understanding as well?

@EricNeyman Disagree. The original dialogue mentions any of "the 2022, 2023, 2024, or 2025 IMO". The fact that Eliezer + Paul were allowing the system to be run until the end of 2025 suggests that they are fine with retrodicting a model against the most recent IMO as long as the model had never seen those questions. I thus think benchmarking o3 against the 2024 IMO should count (assuming it gets Gold-medal performance within time constraints) if OAI can confirm directly or indirectly that o3 hadn't seen those problems.

bought Ṁ10,000 NO

@AdamK Quoting the description:

"'I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem'...

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.

Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task."

That's why I think o3 getting gold on 2024 IMO wouldn't count. Do you still disagree?

@EricNeyman My understanding is that this market depends on whether Paul/Eliezer agree that the feat of IMO Gold was accomplished. If OAI clarifies that o3-mini has never seen the 2024 IMO, and the system gets Gold, I would call on Eliezer and/or Paul to say their bet has been settled, at least in spirit. Again, assuming o3 actually scores a Gold within time constraints, I don't think there's any point in them waiting another 5 months to make a statement.

However, if they want to wait to resolve until after the 2025 IMO, that's fine by me as well.

@AdamK Huh, that's confusing. Suppose for instance that o3 scores a gold on the 2024 IMO, but no model built before the 2025 IMO scores a gold on the 2025 IMO. It seems pretty unambiguous to me that the market ought to resolve NO, given that the "build before the IMO" clause wasn't satisfied. I don't really see a reading of the question under which the market ought to resolve YES.

Or is your claim more like: "I'm very confident that if o3 scores a gold on the 2024 IMO, then the market will resolve YES after the 2025 IMO, so we might as well resolve it now"? (If so, I don't think that such evidence would meet my confidence bar for resolving the market now.)

(Also oops, didn't mean to thumbs-down your latest comment, pressed that button by accident.)

@AdamK The market relies on what Paul and Elizier say. So they could always decide to resolve early for some reason.

However, I would guess based on their wording they won't resolve for AI trained after the IMO in question. Any AI trained after the IMO could have both the questions and answers from the IMO in their training data, and it's easy for a current AI to reproduce an answer directly from its training data. The point of the question is to see whether the AI can solve novel problems that it hasn't been trained on, and the only way you can ensure that is by taking a AI trained before the problems were released publicly.

I think there's some chance (but not certain) Eliezer says this should resolve YES after the 2025 IMO, because he phrased things as the "technical capability exisiting at the end of 2025", but I think it's unlikely he jumps the gun before the 2025 IMO.

I think this is pretty likely somewhat plausible! probably like 25%?

@EricNeyman It's definitely up to Paul / Eliezer, and I won't speak on either of their behalves.

My opinion is that it is kosher to retrodict a new model against the most recent competition, as long as it hasn't seen the questions, and conclude "An AI got Gold in this Olympiad," and "the technical capability exists" for an AI to get a Gold Medal on the IMO. This is, for instance, how the resolution criteria are set up for the Metaculus 2025 IOI question. I agree that this market shouldn't resolve yet if we decide to interpret the "before the IMO" clauses strictly. With loans, I lose little if this market doesn't resolve soon, and NO traders might even burn more mana on cope.

bought Ṁ5,000 YES from 81% to 82%

@EricNeyman @AdamK I somewhat agree with both of you.. As long as Eliezer or Paul make an unambigous statement at some point, things will be fine. But I'm afraid something like "I'll count this as a correct prediction on my part, as this happened at least in spirit, but according to the actual letter of the bet I admit I was wrong" might happen.

@AdamK At this point I'd judge based only on the 2025 IMO. My criterion was "an AI system built before the IMO is able to get a gold," mirroring the IMO grand challenge. I expect Eliezer would agree with this interpretation.

@PaulChristiano Thank you for clarifying. This makes sense.

Comment hidden
Comment hidden
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules