Log inSign up
drubinstein
144 posts
Image
user avatar
drubinstein
@dsrubinstein
Making models go brrr | Engineering @reflection_ai | Occasional PufferLib contributor
github.com/drubinstein
Joined March 2024
84
Following
582
Followers
  • Pinned
    user avatar
    drubinstein
    @dsrubinstein
    Mar 5, 2025
    Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. Blog posted below
    Image
    00:00
    56K
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 5, 2025
    Replying to @dsrubinstein
    drubinstein.github.io
    Learning Pokémon With Reinforcement Learning
    Hi! Since 2020, we’ve been developing a reinforcement learning (RL) agent to beat the 1996 game Pokémon Red. As of February 2025, we are able to beat Pokémon Red with Reinforcement Learning using a...
    3K
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 6, 2025
    A common problem we have when using Pokemon for RL. Eventually the policy collapses. The agent begins to run a fixed loop over the map
    Image
    1.1K
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 5, 2025
    Replying to @dsrubinstein
    Big props to @DanAdvantage @kywch500 @jsuarez @computerender We are making something that's pretty spectacular.
    2.5K
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 7, 2025
    I've been working at Reflection since September. It's been really fun and exciting!
    user avatar
    Misha Laskin
    @MishaLaskin
    Mar 7, 2025
    Replying to @MishaLaskin
    We're excited to partner with Sequoia, Lightspeed, and CRV and big thank you to @shiringhaffary for covering the story. bloomberg.com/news/articles/…
    1.3K
  • user avatar
    drubinstein
    @dsrubinstein
    Apr 4, 2025
    Closing in working surf, but until then here are some fun videos from recent runs. Starting with an agent using surf
    Image
    00:00
    2.9K
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 6, 2025
    Replying to @dsrubinstein
    A possible solution? Increase the episode duration. In this experiment, we run with an episode 5-10x longer than previous. Suddenly, the agent started learning and using surf!
    Image
    477
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 5, 2025
    Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. drubinstein.github.io/pokerl/
    Image
    00:00
    899
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 14, 2025
    PokeRL Update: Recordings should be possible soon.
    Image
    00:00
    274
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 14, 2025
    That moment you realize that you recreated I-frames and P-frames. For PokeRL recordings: PyBoy save states are the I-frames and actions are the P-frames!
    245
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 6, 2025
    Little hint as to what we're currently tackling. Can you guess where the player is?
    Image
    464
  • user avatar
    drubinstein
    @dsrubinstein
    Apr 4, 2025
    Replying to @dsrubinstein
    And an environment post policy collapse. Courtesy @DanAdvantage
    Image
    00:00
    263
  • user avatar
    drubinstein
    @dsrubinstein
    Apr 4, 2025
    Replying to @dsrubinstein and @DanAdvantage
    And what an environment looks like to the agent during a swarm
    Image
    00:00
    210
  • user avatar
    drubinstein
    @dsrubinstein
    Mar 10, 2025
    Replying to @jsuarez
    This is actually Rock Tunnel!
    136

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up