A one-armed robot stood in front of a table with three plastic figurines of a lion, a whale and a dinosaur. An engineer gave the following command to the robot: “Pick up the extinct animal” and the robot made a commotion. A moment of noise, then a hand reached out, opened its claw, lowered itself, and picked up the dinosaur.
This presentation, which I attended during an interview for my podcast at Google’s robotics division in Mountain View, California, was impossible until a long time ago because robots couldn’t confidently handle things they’d never seen before. She certainly has advanced thinking skills to make the connection between “extinct animal” and “plastic dinosaur”.
The robotics industry is approaching a true revolution based on recent developments in so-called “big language models” — the same kind of artificial intelligence that powers chatbots like ChatGPT and Bard.
Google has recently started giving its robots unique sophisticated language models similar to artificial brains. This secret program contributed to the “stimulation” of these robots and gave them new powers to understand and solve problems.
During a review of Google’s latest robot models, the “RT-2” robot was unveiled, a first step toward what company executives described as a quantum leap in the way robots are built and programmed.
In this context, Vincent Vanoghette, head of the robotics department of Google’s DeepMind Lab, said: “As a result of this change we had to rethink our entire research program, because many of the designs we had previously worked on lost their viability.”
A promising breakthrough
Ken Goldberg, a professor of robotics at the University of California, Berkeley, said robots still fail at some basic tasks from the level of human intelligence, but Google uses linguistic artificial intelligence models. Skills in reasoning and development indicate a promising improvement.
He added, “What’s really fascinating is connecting verbal meanings to robots. “This is very exciting for the world of robotics.”
But to understand the significance of this development, it is necessary to provide some information about the traditional method followed to create robots.
For years, engineers at Google and other companies have relied on training robots to perform motor tasks — like flipping a burger, for example — by programming them using a list of specific instructions. Robots repeat the task several times, and engineers adjust the instructions to correct them.
This approach has been successful in some limited applications, but training robots this way is slow and difficult because it requires collecting a lot of data from real-world experiments. If you want to train a robot to do a new task, like flipping a cake instead of a burger, you have to rebuild it from scratch.
These limitations have partly contributed to the delay in the progress of robots that rely on robotic structures compared to robots that rely on software. OpenAI Lab, developer of the ChatGPT bot, disbanded its robotics team in 2021 due to slow progress and lack of high-quality training data. In 2017, Google’s parent company Alphabet sold its subsidiary Boston Dynamics, which specializes in robotics.
But in recent years an idea struck Google engineers: What if we could use linguistic artificial intelligence models trained on vast amounts of web text to encourage robots to acquire new skills instead of doing one task at a time?
“Vision and Action”
Carol Hausman, a research scientist at Google, revealed that “a couple of years ago they started exploring these linguistic models and then started making a connection between them and robots.”
Google started an effort to combine robots and linguistic models in the “Palm-Sci-Can” project announced last year. The project attracted some interest, but its effectiveness was limited because its robots lacked the ability to analyze images — a basic skill they would need if they were to travel the world. These robots succeeded in developing detailed and organized instructions for performing various tasks, but they were unable to translate these instructions into action.
Google’s new robot, RT-2, can do this, and for this reason the company calls it a “vision-language-action” model, or an artificial intelligence system that not only looks at and analyzes the world around it, but also teaches the robot how to move.
The model does this by translating the robot’s movements into a sequence of numbers — called encoding — and inserting these codes into the same training data used in the language model. Ultimately, RT-2 can guess how a robotic arm should move to pick up a ball or throw a can into a trash can, just as Bart or ChatGBT learns to guess the next words in a poem or historical title.
“In other words, this model can learn how to speak the language of robots,” Hausmann said.
During the hour-long demonstration, my blogging partner and I saw how the RT-2 performed a number of impressive tasks. One of these successful tasks was the following complex instruction: “Move the Volkswagen to the German flag”, which the robot succeeded in executing by finding a model of a Volkswagen bus, cutting it apart and attaching it to a small German flag. A few meters away.
The robot has also demonstrated the ability to follow instructions in languages other than English, and even discover theoretical relationships between related principles. When I wanted RT-2 to catch a ball, I told him, “Catch Lionel Messi,” and he succeeded in doing the task on the first try.
However, the robot was not perfect as it misidentified the flavor of a can of soft drink placed on the table in front of it. (The package was lemon-flavored, but the robot suggested orange.) Another time, when asked about the types of fruit on the table, the robot replied “white” (it was a banana). A Google spokesperson explained the error by saying that the robot was using the answer to a previous test question because its Wi-Fi connection was momentarily lost.
Google doesn’t currently plan to sell the RT-2 or make it more widely available, but its researchers believe these new machines equipped with language models will eventually be more useful for tasks beyond entertainment tricks. For example, these robots can work in warehouses, in the medical field or even in the domestic help sector – folding laundry, emptying the dishwasher or packing the house.
Vanoghette concluded: “This development opens the door to the use of robots in human environments, in the office, home and anywhere physical work is required.”
* The New York Times Service