An explicitly DPO-based technique is one that cites DPO as seed material for its creation.
Frontier labs currently include: OpenAI, DeepMind, Anthropic, Google. I will modify this description if this changes (e.g. if Meta releases a SOTA LLM.)
Public simply means that it has been announced or otherwise discovered that this DPO LLM has been trained.
Meta just released Llama 3: "Our approach to post-training is a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO)".
I think this prediction should resolve to yes.
@StephenMcAleese Likely this market will resolve yes. Keep in mind that Llama-3 has over 400B params and benchmarks worse than Opus at the current checkpoint. I will wait a few days after the model is widely available to determine whether I classify Meta as a frontier lab or not.
@1832489723645 Really? It says on the website that is has 70B parameters like Llama 2: https://ai.meta.com/blog/meta-llama-3/
@StephenMcAleese The model that is benchmarking close to Opus at the current checkpoint is the 400B model, which is not available for use yet.
Do you consider IPO (http://arxiv.org/abs/2310.12036) explicitly DPO based? It is a generalisation.
@HanchiSun I won't resolve because I don't consider HuggingFace a frontier lab, but it's interesting that FOSS is starting to prefer DPO for smaller models.