ohyeah

Frontier Energy Efficiency Post Training

Comment

Problem: we wanted to find what's the maximum number of tokens we can serve with per megawatt.

Solution: implement energy LLM inference efficient kernels.

Our Solution:

Create an eval environment for LLM inference kernel research. It works with any LLM and hardware (We used Qwen 3 on B300).
Create an autoresearch harness, that optimizes our research environment.
Train our own RL models on our autoeval solution.

findings: We (or our agents) found kernels that are 7% more energy efficient for full LLM inference + 30% more efficient on kernelbench.

Built With

claude
cuda
devin
huggingface
prime-intellect
pytorch

Updates

Armin S started this project — Jun 20, 2026 08:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.