The future of robotics, generalist or specialist?

You Liang Tan — Mon, 13 Nov 2023 06:55:01 GMT

This past month, I attended two robotics conferences, ROSCon and CORL23 (Conference for Robot Learning). I noticed a great divide between both audiences for such a niche field of software robotics. One is geared toward classical robotics developers, and the latter is for machine learning enthusiast's academic audiences. Here, I will distill some of my thoughts after participating in both conferences and discuss the trend of end-to-end robot learning to create a generalist agent.

CORL 2023 at Atlanta, USA

ROSCON 2023 at New Orleans, USA

Briefly share my journey in Robotics, which started when I participated in robotics competitions when I was 10 years old in Malaysia. Along the way, my passion for robotics never faded. I participated in competitions, interned at a self-driving car company, and worked in various companies or organizations on robotics. Subsequently, I managed to join Open Robotics in building open-source robotics software for the robotics community. My work mainly involves developing Open-RMF and other ROS-related consulting projects. Although I really enjoy my work in contributing to the open-source community, something still stuck in my mind. I realized the huge gap between people’s perceptions and the reality of robotics. Robots are still not “intelligent”, robotics companies are hard-to-scale money furnaces and scripting/plumbing is the main work for “programming” a robot. Most importantly, I am incredibly frustrated by the progress in robotics is not moving fast enough compared to other internet-based and “AI” companies.

With that, I decided to pursue grad school at Georgia Tech, and use this chance to fully immerse in the progress of AI. It might hold the key to creating the next generalist foundation robot? One year in GT, and published a paper on imitation learning, I managed to catch up with the trend in robot learning. Inspired by LLM, the current trend is to train a foundation model for robotics (RT2, RTX) through brute-force, to learn the relationship between perception and robot action. In fact, I am currently working on such robotics foundation model in Berkeley AI Research. Yet, I constantly think hard about the long-term prospect of robotics.

RTX, taken from: https://github.com/google-deepmind/open_x_embodiment

Let's discuss the conferences for a moment. At ROSCon, attendees focus on topics such as software engineering, ROS, and Gazebo simulation, while at CORL, discussions revolve around RL, transformer, Mujuco, data, and scale. Both communities work on vastly different tech stacks. I appreciate the ambitiousness of the research community at CORL, but ROSCon seems more grounded in reality. On the other hand, I believe we should always be forward-thinking in pushing the boundaries of science and technology.

The bitter lesson speaks the obvious. We shouldn’t handcraft the sense-plan-act pipeline, which introduces our own inductive bias into the solution. In this case, training end-to-end is the way. However, robot data is expensive. Different embodiments make robot data hard to transfer across embodiments (e.g. Locomotion for Elephant vs Fish). Looking back at the short history of Natural language processing (NLP), we progress from LSTM to a gigantic transformer LLM model, and now multimodal LLM (GPT4). Think about it, robot foundation can be treated as a multimodal model, with robot action as output.

CORL Workshop Debate: Is Scaling Enough to Deploy General Purpose Robots: Chelsea Finn, Russ Tedrake, Sergey Levine, Xiaolong Wang, Emo Todorov, Scott Kuindersma, Stefan Schaal (video link)

However, there are fundamental issues with the current multi-embodiment RTX model. For example, in robotics, SE(3) transformation and frame of reference are essential concepts in understanding the spatial relationship of different sensors and actuators. The abstraction of certain robotics foundation models assumes this is learnable, but often this is not generalizable to a new embodiment or environment. Also, the formulation of multi-modality for cross-embodiment is problematic. Until we have infinite data, the RTX end-to-end generalist solution is still not the ultimate solution. (other exciting ideas of leveraging prior knowledge with LLM for RL: Eureka, Lang2Rewards)

The challenges in Robotics are not just algorithmic and data problems, but also system design and business use cases. A full robotics system requires different domain knowledge from mechanical, electronics, machine learning, and fleet management to cloud infrastructure. On the business side, one should always consider if robotics is the right solution to solve the problem. I often think that most robotics solutions are over-engineered to existing problems. For example, using a conveyor belt vs AGV, patrolling with a legged-robot vs CCTV, and cobot making a coffee vs coffee machine. Robotics should be considered as a feasible solution only when it is the most efficient and reliable option available.

Returning to the fundamental question, do we need a specialist or a generalist robot? The dilemma is, when we solve a task with a generalist, subsequently, we hope it becomes a faster, better, and easily parallelizable specialist. The eventual end goal of autonomous generalists is to become boring automated specialists. However, there are still places to stand for generalists. The world/environment is highly dynamic, and we want robots to be as adaptable and agile as we humans. This is important for flexible manufacturing system or highly dynamic human environment.

There are definitely markets for both generalist and specialist robots, just like ChatGPT (generalist) and Finetuned ResNet-50 (specialist). The hopeful part is the academic community is constantly seeking novelty in pushing the boundary of robotics. However, ImageNet for robotics is not as straightforward a task. Until we have a ChatGPT moment for robotics, market forces will incline to favor automated “specialists” instead of autonomous “generalists”.

— — — — — —

Please do share your thoughts.

Stories by You Liang Tan on Medium

The future of robotics, generalist or specialist?