What is Perception?
How do robots see? Why is it important that robots can perceive their environment? How are artificial intelligence and machine learning leveraged for more accurate computer vision? We sat down with Perception Engineer Bob DeBortoli and Machine Learning Engineer Shelby Cass, to dig into the complex topic of robot perception.
According to DeBortoli, “Perception is analyzing the sensing data that the robot gets, and outputting useful information for downstream tasks.” In Digit’s case, these “downstream tasks” could be manipulating totes, avoiding obstacles, or just simply knowing where to place its feet. And as the platform develops, more and more tasks will need to make use of perception data.
Cass said that generalized, or multipurpose robots aren’t quite possible without actively perceiving and understanding the environment that surrounds them. But something that is second nature for humans, is a monumental challenge for robots.
Consider this image. You probably almost instantly recognized it as a frozen cube of broccoli without even thinking about it, but this task could prove quite confusing for a computer vision algorithm. Most people probably take for granted how quickly and how accurately their eyes detect what is in front of them, even in new scenarios.
The perception work that Agility is doing is unique from other computer vision applications. These perception algorithms have to work for Digit, a physical, mobile platform designed to be out in the world doing real work. And what’s more, they have to perform in real-time.
“Digit is a mobile platform, and it’s real-time environment interactions, “ Cass said. “With static cameras, you can go and playback images. They have no motion blur. You can really control your environment a lot. But with Digit moving around, the effects of how you detect the first thing also subsequently affect everything after that.”
This becomes incredibly important when tackling challenges like locomotion. The quality of a terrain map can be the difference between a balanced, stable gait, or stumbling to the ground. The same challenges apply to obstacle avoidance and object detection. What’s more, the difficulty of these tasks is compounded when accounting for the unpredictability and randomness of the world.
“One of the big challenges we have on the perception team is robustness, especially to new scenarios that we haven’t seen before,“ said DeBortoli.
Unsurprisingly, one of the most powerful tools to tackle these problems is machine learning.
“You can do vision without AI,” said Cass. “You can do things like object recognition. There are traditional algorithms for that. But since AlexNet came out, it beats every other traditional model on this large database called ImageNet. So there’s really no better tool to use.”
AlexNet is a convolutional neural network: a computing system that processes data in ways that mimic the human brain. Designed by Alex Krizhevsky, AlexNet excels at object detection. And when using machine learning and neural networks, there is one resource valued above all else.
“Data is the most important thing in AI,” said Cass. “Without good data, we cannot build good AI. So going and collecting more and more data as we go to more and more environments will help us get even better at that general base model.”
DeBortoli stressed the importance of continually challenging the perception models with new and varied data.
“If we gather a lot of training data for the machine learning model in a certain environment, and then we deploy in a different environment, that’s going to start to challenge our perception algorithms,” he said. This is why it is crucial to give the models as much practice as possible, even when it becomes an unlikely or unrealistic scenario.
Currently, perception algorithms require lots of human intervention and maintenance, and they are mostly tailored to specific tasks. Engineers have to manually filter through training data to help teach the model what's what. But the world isn't predictable and it isn't easily modeled. It's full of randomness and uncertainties. So general purpose robots will need to be able to adapt on the fly.
“Instead of developing a perception algorithm to run once on one set of data, you want to be developing an algorithm that's robust to unseen scenarios,” DeBortoli said. “One thing that we're constantly thinking about is identifying those cases where the perception algorithm fails to filter that information back to the engineers, and then we can design different algorithmic solutions for that.”
The goal is that someday robots will be able to do this themselves, using their onboard systems to detect failures and update the algorithm accordingly.
“We have so much complexity, so many things that come into play,” said Cass, “and we want it to be able to interact with all of those things and handle them dynamically, handle people and situations in totally unknown locations, and still be able to do its job safely.”