Problem: we wanted to find what's the maximum number of tokens we can serve with per megawatt.

Solution: implement energy LLM inference efficient kernels.

Our Solution:

  1. Create an eval environment for LLM inference kernel research. It works with any LLM and hardware (We used Qwen 3 on B300).
  2. Create an autoresearch harness, that optimizes our research environment.
  3. Train our own RL models on our autoeval solution.

findings: We (or our agents) found kernels that are 7% more energy efficient for full LLM inference + 30% more efficient on kernelbench.

Built With

Share this project:

Updates