Finding the “Best” You (Well, the Best Model Anyway!)

Welcome back to All that is under the Sun, your friendly guide through the fascinating, and sometimes bewildering, world of Artificial Intelligence. This month, we’re kicking off a brand-new series that gets to the very heart of how AI learns and improves. Think of it as the AI equivalent of a motivational seminar, a Marie …

Read more

An In-Depth Look at Group Relative Policy Optimization (GRPO)

In recent months, the DeepSeek team has showcased impressive results by fine-tuning large language models for advanced reasoning tasks using an innovative reinforcement learning technique called Group Relative Policy Optimization (GRPO). In this post, we’ll explore the theoretical background and core principles of GRPO while also offering a primer on Reinforcement Learning (RL) and its …

Read more