Chenluye99

Chenluye99

Pinned Loading

RLHFlow/Reinforce-Ada RLHFlow/Reinforce-Ada Public

An adaptive sampling framework for Reinforce-style LLM post training.

Python 96 17
RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

Recipes to train reward model for RLHF.

Python 1.5k 110
Adaptive-Layerwise-Perturbation Adaptive-Layerwise-Perturbation Public

Python 2