Google DeepMind Robotics

My initial ask from the robotics team at Google DeepMind was to create a single animation for a blog. I designed a character that illustrated the evolution of a robot, and the team was so impressed with the concept that the design was adopted as the official mascot. This led to a series of follow-up projects, including multiple variations of the mascot and a robotic dog. One of my animations was recently featured at ICRA 2024 in Tokyo, where it was displayed as part of the team's presentation—a presentation that won them an award. They adopted these bots for swag that the teams would wear to Google Events.

SARA: Self-Adaptive Robust Attention

Google DeepMind introduces SARA-RT, a method that improves the efficiency of Robotics Transformers (RT) for robots. SARA-RT makes complex RT models smaller and faster while maintaining their accuracy. This could be beneficial for deploying robots in real-world situations.

SARA was awarded Best Paper in Robotic Manipulation at ICRA 2024

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

RT-Sketch introduces hand-drawn sketches as a new way for robots to learn. Sketches offer a clear and flexible way to specify goals, allowing robots to understand the task and handle ambiguity better than traditional methods.

DAY/NIGHT: Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Our robots can be taught to do new tasks, using large Foundation models (LLM & VLMs) pre-trained with in-context learning (ICL), which enables both high-level teaching and low-level teaching.

Scaling up learning across many different robot types

Together with partners from 33 academic labs, we have pooled data from 22 different robot types to create the Open X-Embodiment dataset and RT-X model

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer.

RT-2: Vision-Language-Action Models

Robotic Transformer 2 (RT-2) is a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control.

RT-2: Vision-Language-Action Models