Proximal Policy Optimization Archives

An In-Depth Look at Group Relative Policy Optimization (GRPO)

February 24, 2025

In recent months, the DeepSeek team has showcased impressive results by fine-tuning large language models for advanced reasoning tasks using an innovative reinforcement learning technique called Group Relative Policy Optimization (GRPO). In this post, we’ll explore the theoretical background and core principles of GRPO while also offering a primer on Reinforcement Learning (RL) and its …

Advancing LLM Fine-Tuning with Group Relative Policy Optimization (GRPO)

February 18, 2025February 17, 2025

Reinforcement Learning (RL) has become a powerful technique for fine-tuning large models, especially Large Language Models (LLMs), to improve their performance on complex tasks. One of the latest innovations in this area is Group Relative Policy Optimization (GRPO), a new RL algorithm introduced by the DeepSeek team. GRPO was designed to tackle the challenges of …