Search Results for "dpo"
Open-Source AI Judges Beat GPT-5.2 at 15x Lower Cost Using DPO Fine-Tuning
Together AI demonstrates fine-tuned open-source LLMs can outperform GPT-5.2 as evaluation judges using just 5,400 preference pairs, slashing costs dramatically.
Nous-Hermes 2 Mixtral 8x7B Surpasses Mixtral Instruct in Benchmark
Nous-Hermes 2 Mixtral 8x7B surpasses Mixtral Instruct in benchmarks, offering SFT and SFT+DPO models with ChatML prompt format, enhancing AI performance and user experience.