Will an LLM become a Pokèmon Master by the end of 2025?
17
Ṁ1550
2026
51%
chance

I'll give bounties to people who suggest reasonable improvements to the criteria.

https://www.twitch.tv/claudeplayspokemon

Anthropic has taken the benchmark world by storm by assessing model performance against Pokèmon:

https://www.anthropic.com/news/visible-extended-thinking

Will any large language model become a Pokèmon Master by the end of 2025? To count, it must:

  • Complete a regular (being any of the base games like red/gold/sapphire/black/etc) Pokemon game, by getting all gym badges and beating the Elite 4 + rival.

  • Without assistance or steering mid-game.

  • With minimal non-LLM programmatic assistance. I think the automatic pathfinding that Claude is using is a little bit cheating, if that helps with the spirit of this market. Something roughly twice as bad would maybe start to not count.

Any number of "shots" are allowed, as in, the model can try an infinite number of times. I reserve the right to disqualify an attempt if it involves obscene abuse of save states, though.

RAG, knowledge files, custom system prompts, and interesting input/output schemes are all allowed. Anthropic has an interesting approach with Claude.

See also: /Sketchy/will-claude-become-a-pokemon-master-ng2zSA9ync

Get
Ṁ1,000
and
S3.00
Sort by:

safari zone is a massive problem, the step count is limited and it costs money for each attempt. you need a system that is cracked at navigation to get past that

Does this count as "assistance or steering mid-game."?

@Lorenzo ummm… im going to say no, but I won’t lie it’s in part because it seems a shame to disqualify Claude for a small tweak this early. If they continually tweak it throughout the run, that feels unfair.

I will make some more explicit criteria around this soon, I guess.

Things claude has hallucinated in the 5 minutes I've watched this stream:
- thinks bulbasaur has a type disadvantage against squirtle
- thinks the exit to oaks lab is at the top of the screen
- successfully exited oaks lab, and then went back into it, thinking it was route 1

- went back to the top of the screen after re-entering oaks lab

I don't think 3.7 sonnet is going to be able to do this in any number of tries, I assume it ended up stuck in some sort of infinite loop in the midgame that it couldn't break out of.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules