At ODSC East in May 2025 in Boston, I walked through an end-to-end example of fine-tuning Meta's Llama 3.1 Instruct 8.1B parameter model, (released in July 2024), to do better at reasoning through the math word problems in the GSM8K dataset (released by OpenAI in October 2021).
• The notebooks containing all the code are here: GitHub repo link
• An CSV containing the experiment results (see the GitHub repo for a detailed description) can be found here: download link
The official title of the talk was "Basic Theory and Practice of Fine-Tuning LLMs with PPO and GRPO" - it also included a deck where I walked through the PPO and GRPO algorithms at a more conceptual level. Email me and I'll send you the slides!