How Entropy Shapes RL Performance

A simulated entropy curve control playground, powered by the empirical/theoretical findings of Entrocraft.

Entropy 𝓗

Entrocraft Entropy Recommended Range GRPO Entropy

Drag any dot up or down to reshape entropy.

AIME-25 mean@32

Entrocraft GRPO baseline

Linear annealing — performance steadily improves and stays stable.

Low Entropy

Fast initial gains, but the model saturates at few solutions and plateaus early at a low ceiling.

Annealing Entropy

Linearly decaying entropy curve (from ~0.6 to ~0.2) yields the best long-term accuracy with stable training dynamics.

High Entropy

Excessive exploration introduces instability to the training dynamics.