drubinstein (@dsrubinstein) / X

drubinstein

144 posts

drubinstein

@dsrubinstein

Making models go brrr | Engineering @reflection_ai | Occasional PufferLib contributor

github.com/drubinstein

Joined March 2024

Following

582

Followers

Pinned
drubinstein
@dsrubinstein
Mar 5, 2025
Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. Blog posted below
00:00
56K
drubinstein
@dsrubinstein
Mar 5, 2025
Replying to @dsrubinstein
drubinstein.github.io
Learning Pokémon With Reinforcement Learning
Hi! Since 2020, we’ve been developing a reinforcement learning (RL) agent to beat the 1996 game Pokémon Red. As of February 2025, we are able to beat Pokémon Red with Reinforcement Learning using a...
3K
drubinstein
@dsrubinstein
Mar 6, 2025
A common problem we have when using Pokemon for RL. Eventually the policy collapses. The agent begins to run a fixed loop over the map
1.1K
drubinstein
@dsrubinstein
Mar 5, 2025
Replying to @dsrubinstein
Big props to @DanAdvantage @kywch500 @jsuarez @computerender We are making something that's pretty spectacular.
2.5K
drubinstein
@dsrubinstein
Mar 7, 2025
I've been working at Reflection since September. It's been really fun and exciting!
Misha Laskin
@MishaLaskin
Mar 7, 2025
Replying to @MishaLaskin
We're excited to partner with Sequoia, Lightspeed, and CRV and big thank you to @shiringhaffary for covering the story. bloomberg.com/news/articles/…
1.3K
drubinstein
@dsrubinstein
Apr 4, 2025
Closing in working surf, but until then here are some fun videos from recent runs. Starting with an agent using surf
00:00
2.9K
drubinstein
@dsrubinstein
Mar 6, 2025
Replying to @dsrubinstein
A possible solution? Increase the episode duration. In this experiment, we run with an episode 5-10x longer than previous. Suddenly, the agent started learning and using surf!
477
drubinstein
@dsrubinstein
Mar 5, 2025
Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. drubinstein.github.io/pokerl/
00:00
899
drubinstein
@dsrubinstein
Mar 14, 2025
PokeRL Update: Recordings should be possible soon.
00:00
274
drubinstein
@dsrubinstein
Mar 14, 2025
That moment you realize that you recreated I-frames and P-frames. For PokeRL recordings: PyBoy save states are the I-frames and actions are the P-frames!
245
drubinstein
@dsrubinstein
Mar 6, 2025
Little hint as to what we're currently tackling. Can you guess where the player is?
464
drubinstein
@dsrubinstein
Apr 4, 2025
Replying to @dsrubinstein
And an environment post policy collapse. Courtesy @DanAdvantage
00:00
263
drubinstein
@dsrubinstein
Apr 4, 2025
Replying to @dsrubinstein and @DanAdvantage
And what an environment looks like to the agent during a swarm
00:00
210
drubinstein
@dsrubinstein
Mar 10, 2025
Replying to @jsuarez
This is actually Rock Tunnel!
136