Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm that was used to train DeepSeek-R1.
Group Relative Policy Optimization
Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm that was used to train DeepSeek-R1.