Log inSign up
Kyoung Whan Choe
1,377 posts
user avatar
Kyoung Whan Choe
@kywch500
Robot Learning Engineer @ RLWRLD.ai
Mountain View, CA
Joined June 2020
2,030
Following
1,105
Followers
  • Pinned
    user avatar
    Kyoung Whan Choe
    @kywch500
    Apr 8
    This was a very timely project, and I had so much fun and learned so much about agent evaluation. Kudos to the SkillsBench community!
    user avatar
    Xiangyi Li
    @xdotli
    Apr 8
    How well are agents at using the latest CLI tools like GWS CLI, and how they can safely use them? Introducing ClawsBench, the first benchmark that measures both LLM capability and safety in a set of high fidelity and stateful environments and scenarios. We made 5 mock
    Image
    1.1K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Oct 18, 2024
    Here is an RL quickstart guide, based on @jsuarez's opinionated suggestions: kywch.github.io/start-rl/ TL;DR: Start with CleanRL and Pufferlib.
    25K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 10, 2024
    Submitted to NeurIPS D&B, unlikely to make it. Still, proud to share a massive Neural MMO update: integrates previous competition know-how, 3x faster! By @jsuarez, @RyanSullyvan, and me! arXiv: arxiv.org/pdf/2406.05071 GitHub: github.com/kywch/meta-mmo
    9.3K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 22, 2024
    Pufferlib + PPO + CARBS is working its magic on Mujoco Ant-v4, reaching 5000+ in 8 min with a gaming desktop. Hyperparameter sweeps with PPO seem very promising.
    Image
    5.4K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Sep 10, 2024
    Just adding LayerNorm to critic networks makes SAC learning faster, hmm. I should try this. From the learning to walk in 20 min paper arxiv.org/abs/2208.07860
    Image
    3.7K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 8, 2024
    Started to mess with stable baselines. Started to understand why CleanRL was born, and @jsuarez is making pufferlib.
    1.7K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Oct 18, 2024
    For my lit reviews, I want a whiteboard where I can move sticky notes around and write text here and there. I have used Google Slides, but I also want to add links and other stuff. So I made a React app called Card Table, with much help from Claude. kywch.github.io/card-table/
    Image
    1.4K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 16, 2024
    Mujoco Hopper training at 42k sps, 15M steps in 6 min. Running CARBS sweep now.
    user avatar
    Joseph Suarez 🐡
    @jsuarez
    Jul 16, 2024
    I finally got @imbue_ai CARBS hyperparam algo working with @wandb sweeps properly w/ correct graphs. It crushed the test env overnight. Continued performance improvements after >>50B steps worth of experiments!
    Image
    3.2K
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Sep 26, 2024
    It feels fresh to use Linux desktop again. Pop!_OS comes with CUDA drivers correctly installed, and installing Vulkan was a one-liner.
    473
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 23, 2024
    Reach out to Joseph if you want to 100x your RL.
    user avatar
    Joseph Suarez 🐡
    @jsuarez
    Aug 23, 2024
    Live Reinforcement Learning Dev x.com/i/broadcasts/1…
    749
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 11, 2024
    Testing pufferlib for robot sims, made a PR and got merged. A good day.
    321
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 20, 2024
    It's so satisfying to see hyperparameter sweeps working with one GPU, of course. Quick experimentation rocks!
    472
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 16, 2024
    Check out this beautiful work with c -- justine.lol/matmul/
    user avatar
    Joseph Suarez 🐡
    @jsuarez
    Aug 15, 2024
    Reinforcement learning but C x.com/i/broadcasts/1…
    506
  • user avatar
    Kyoung Whan Choe
    @kywch500
    Aug 14, 2024
    Github is down when I am about to test a gem from 2017 -- "Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution"
    proceedings.mlr.press
    Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using...
    Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and ro...
    450

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up