This repository is the official PyTorch implementation of Pion Optimizer, by Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu.
The code is coming soon. Stay tuned. :)
We have open-sourced two core implementations of Pion mentioned in our paper: transported_ambient_ambient and lie_lie. The specific code can be found in pion.py. Additionally, we further explored the gradient uniformization of muon under Pion's update rule, and we proposed a pion_msign.py version.
For the exploration experiments mentioned in the follow-up paper, please run:
cd pion/Megatron-LM
bash opt_llama_60M_pion.shModify the pion-update-side, pion-momentum, and pion-scaling parameters in the training script to conduct explorations.
To reproduce the bf16 pretraining experiments in the paper, please use:
bash opt_llama_1.3B_adamw.sh # AdamW
bash opt_llama_1.3B_muon.sh # Muon
bash opt_llama_1.3B_pion.sh # PionFor reproducing the Normalization-free experiments in the paper, please use:
bash opt_llama_60M_adamw_no_norm.sh # AdamW
bash opt_llama_60M_muon_no_norm.sh # Muon
bash opt_llama_60M_pion_no_norm.sh # PionThe RL experiments are built on top of verl. Please follow the installation instructions in verl/README.md to set up the environment.
Before running, you need to edit the scripts and replace the placeholder paths:
/path/to/your/dataset/— path to the preprocessed dataset (see verl data preparation)/path/to/your/model— path to the pretrained model.
We provide a ready-to-use script for training Qwen3-1.7B on the DeepMath dataset using GRPO with the Pion optimizer:
cd verl
bash examples/grpo_trainer/run_qwen3_1.7b_pion_deepmath.sh # for Qwen3-1.7B
bash examples/grpo_trainer/run_distilled_pion_deepmath.sh # for DeepSeek-R1-Distilled-Qwen-1.5BTo run baseline comparisons with AdamW and Muon:
# Qwen3-1.7B
bash examples/grpo_trainer/run_qwen3_1.7b_adamw_deepmath.sh # AdamW
bash examples/grpo_trainer/run_qwen3_1.7b_muon_deepmath.sh # Muon
# DeepSeek-R1-Distilled-Qwen-1.5B
bash examples/grpo_trainer/run_distilled_adamw_deepmath.sh # AdamW
bash examples/grpo_trainer/run_distilled_muon_deepmath.sh # Muon