EXPLAINED: Data Collection & Humanoid Robots
Agility CTO Pras Velagapudi gives an overview of the kinds of data we collect at our real-world customer deployments and explains very broadly what we use it for.
Full Transcript of the Conversation Below
Yeah, so one of the huge advantages we have at Agility is that we actually have robots deployed in real-world environments, not just in lab environments. And that means we can collect data from the things that they're actually doing for value at customer sites. And that's really important because that's the true ground truth. As much as you try and replicate something in a lab, you don't necessarily know exactly how it might differ from the real world.
Maybe your real environment is dustier or maybe there's more sunlight coming in, or maybe there's less sunlight coming in, it's really dark, or things like that. Maybe the robot makes a bunch of errors, or there's a bunch of garbage placed in front of it all the time, and it has to deal with that. Getting data from a customer site tells you what is really happening when your robot is out in the world. So that's a huge luxury that we have access to that data because we're out in actual customer deployments.
When we're talking about the types of data that are available, you can think of it as kind of maybe three broad categories of data that we're interested in. The first is the raw sensory data of what the robot is observing around it in the environment. There's camera data coming in from the robot's cameras or LiDAR data coming in from the robot's LiDAR sensors or other types of information about when the robot makes contact with the environment or its inertial measurements or things like that. Things that tell us stuff about the environment.
We can fuse all of that together to build models of what is going on in the world in these facilities. Now, of course, when you're dealing with live customer data, that type of sensory information needs to be protected. There could be privacy concerns or personally identifiable information in there. And so we need to be careful about exactly how we transmit that, what parts of it we actually encode and store.
And that's part of the pipeline that we build. But that information is a really rich source of data about the environment that we're interacting with. The second type of data that we're getting is the really low-level cyclic data about how the robot is responding to that environment. I saw these things, and I exerted these forces. I put my feet in these places. I pushed with my hands, and I grabbed this thing. You're going one level up.
At the base you have the environment, now what did I do about that environment, at maybe a hundred or a thousand hertz? That's also a really rich, dense data set that's telling you a lot about, you know, what are you accurately or inaccurately modeling about your world? If you push too hard, why did you push too hard? What did you see, and why did you think you needed to do that?
As we're figuring out how to make the robot more reliable and how to leave it deployed for longer without interruption. That level of information is really important because it tells you what is accurate or inaccurate about your particular response to the world. However the robot is doing that, whether it's learned models or whether it's model-based, like model predictive controller optimization methods for deciding how to react to the world.
Then you go one step above that, which is what types of tasking was the robot doing, or what types of task-level decision-making was the robot doing. And so, you now, have this environment representation and how did you respond to the environment? Then why were you responding to the environment? Like why were you doing the things that you were doing? That's a slightly higher level task information. And that's the stuff that gets transmitted out all the time on Agility robots. It's what we call telemetry. This information, which is basically abstracted metrics and decisional information about why the robot is doing what it's doing, what behaviors it’s trying to do, what are the tasks that it's trying to complete, what are its own measurements of its health status that might be driving its decisions to do things or not do things.
So that's another really important rich type of data that adds another level of context and gives you another level of introspection into what exactly is going on in the world and what are you trying to do with it. So you put these pieces of information together, and you have a pretty complete view of how the robots interact with the world. That we can use to improve it at various levels. Sort of going in reverse order, the telemetry can tell you when you're doing the wrong thing. I was supposed to pick up the tote, but instead I failed to pick up the tote. I did that 400 times, and I failed 5% of the time. Okay. Well, now I can look at that and say I need that to be 2% or 1% or 0 .1%. So what were the reasons that this thing that I was trying to do didn't work.
If I see something like that, I'm going to drill down another layer. Okay. What was the robot trying to do that it didn't do well? Why did these events occur? Usually, it comes down to whether the robot was trying to do the wrong thing or the robot had the wrong idea about the world for some reason. And so as you're trying to go out there and do the correct thing all the time, in all of the cases, which is the measure of success of the good robot, what you really do is build up these large data sets and then you drill into them to find all the places where you kind of deviate for some unexpected reason.
Something was outside what you planned for, and you can use that to improve what the robot's doing. You can either take that piece of data and add it to a training set or a test set that verifies or learns about that type of condition or it might change your approach for how you're encoding what the robot is doing. Maybe you add a check or an edge case where you say, hey, if the tote doesn't really look like the right shape, maybe it's broken because then, you've been picking up a lot of these totes that are broken and you keep dropping them because they're not really grabable or something like that.
So that's how you create a system that can really run for thousands or tens of thousands of hours reliably. There's really no way around it. You have to kind of build up this database of what the robot's execution looks like, and then drill down into these long tails and use that to build up a more robust robot control system.
So we're basically following this kind of path. As we deploy to customers, we build up this data, we use it to figure out where the deviations are, and where interesting new data is, and then we can put it back into our system to build a more broad and capable platform.