Is the LMSYS chatbot arena leaderboard trustworthy?
๐Ÿ’Ž
Premium
22
แน€11k
2027
49%
chance

LLMs can distinguish their own output from the output of different LLMs and they have a preference for their own output, so it's technically feasible to manipulate the leaderboard by throwing an LLM at the chatbot arena to upvote its own completions.

Has this happened yet? Will it happen soon?

Resolves NO iff, before 2027/7/1, credible media reports state that the lmsys leaderboard has been manipulated with sockpuppet accounts / fraudulent voting. A statement coming directly from lmsys would also count.

Resolves YES otherwise.

Get
แน€1,000
and
S3.00
Sort by:

A Reddit user is claiming they manipulated results in favor of Gemini to win Polymarket bets: https://www.reddit.com/r/MachineLearning/s/VVJo2E38iI

@Vergissfunktor Fascinating. If the story is confirmed by LMSYS, I will resolve the market as NO.

lmsys periodically release part of prompts of users and their votes, so anyone can check that data and find suspicious activities

This needs to happen for a single model among the hundreds on the list for this question to resolve as yes ?

@MalachiteEagle yes (though note that the question would resolve NO in that case). I'm only counting 155 models now.

ยฉ Manifold Markets, Inc.โ€ขTerms + Mana-only Termsโ€ขPrivacyโ€ขRules