Kyoung Whan Choe (@kywch500) / X

Kyoung Whan Choe

1,377 posts

Kyoung Whan Choe

@kywch500

Robot Learning Engineer @ RLWRLD.ai

Mountain View, CA

Joined June 2020

Pinned
Kyoung Whan Choe
@kywch500
Apr 8
This was a very timely project, and I had so much fun and learned so much about agent evaluation. Kudos to the SkillsBench community!
Xiangyi Li
@xdotli
Apr 8
How well are agents at using the latest CLI tools like GWS CLI, and how they can safely use them? Introducing ClawsBench, the first benchmark that measures both LLM capability and safety in a set of high fidelity and stateful environments and scenarios. We made 5 mock
1.1K
Kyoung Whan Choe
@kywch500
Oct 18, 2024
Here is an RL quickstart guide, based on @jsuarez's opinionated suggestions: kywch.github.io/start-rl/ TL;DR: Start with CleanRL and Pufferlib.
25K
Kyoung Whan Choe
@kywch500
Aug 10, 2024
Submitted to NeurIPS D&B, unlikely to make it. Still, proud to share a massive Neural MMO update: integrates previous competition know-how, 3x faster! By @jsuarez, @RyanSullyvan, and me! arXiv: arxiv.org/pdf/2406.05071 GitHub: github.com/kywch/meta-mmo
9.3K
Kyoung Whan Choe
@kywch500
Aug 22, 2024
Pufferlib + PPO + CARBS is working its magic on Mujoco Ant-v4, reaching 5000+ in 8 min with a gaming desktop. Hyperparameter sweeps with PPO seem very promising.
5.4K
Kyoung Whan Choe
@kywch500
Sep 10, 2024
Just adding LayerNorm to critic networks makes SAC learning faster, hmm. I should try this. From the learning to walk in 20 min paper arxiv.org/abs/2208.07860
3.7K
Kyoung Whan Choe
@kywch500
Aug 8, 2024
Started to mess with stable baselines. Started to understand why CleanRL was born, and @jsuarez is making pufferlib.
1.7K
Kyoung Whan Choe
@kywch500
Oct 18, 2024
For my lit reviews, I want a whiteboard where I can move sticky notes around and write text here and there. I have used Google Slides, but I also want to add links and other stuff. So I made a React app called Card Table, with much help from Claude. kywch.github.io/card-table/
1.4K
Kyoung Whan Choe
@kywch500
Aug 16, 2024
Mujoco Hopper training at 42k sps, 15M steps in 6 min. Running CARBS sweep now.
Joseph Suarez 🐡
@jsuarez
Jul 16, 2024
I finally got @imbue_ai CARBS hyperparam algo working with @wandb sweeps properly w/ correct graphs. It crushed the test env overnight. Continued performance improvements after >>50B steps worth of experiments!
3.2K
Kyoung Whan Choe
@kywch500
Sep 26, 2024
It feels fresh to use Linux desktop again. Pop!_OS comes with CUDA drivers correctly installed, and installing Vulkan was a one-liner.
473
Kyoung Whan Choe
@kywch500
Aug 23, 2024
Reach out to Joseph if you want to 100x your RL.
Joseph Suarez 🐡
@jsuarez
Aug 23, 2024
Live Reinforcement Learning Dev x.com/i/broadcasts/1…
749
Kyoung Whan Choe
@kywch500
Aug 11, 2024
Testing pufferlib for robot sims, made a PR and got merged. A good day.
321
Kyoung Whan Choe
@kywch500
Aug 20, 2024
It's so satisfying to see hyperparameter sweeps working with one GPU, of course. Quick experimentation rocks!
472
Kyoung Whan Choe
@kywch500
Aug 16, 2024
Check out this beautiful work with c -- justine.lol/matmul/
Joseph Suarez 🐡
@jsuarez
Aug 15, 2024
Reinforcement learning but C x.com/i/broadcasts/1…
506
Kyoung Whan Choe
@kywch500
Aug 14, 2024
Github is down when I am about to test a gem from 2017 -- "Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution"
proceedings.mlr.press
Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using...
Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and ro...
450