Understanding the journey to embodied AI
When we started Agility Robotics 10 years ago, we told people that humanoid robots would be a part of everyday life. At the time, this was a bold prediction, often met with skepticism. Many saw it as a distant possibility, a 50-year venture. But the world is recognizing that the time is now. The scale is truly massive, projected to be larger than the automotive industry, potentially reaching trillions of dollars. The growing surge of new and old businesses entering the field, along with significant investment, reflects an increasing understanding of the transformative potential of embodied AI.
The beachhead for humanoid robots is semi-structured industrial settings, such as logistics and manufacturing. From there, they will rapidly expand into less structured environments such as grocery and retail stock rooms. After that, they will roam the floors stocking shelves, and out into construction, health care, and the industries that demand greater generality in their perception, intelligence, decision-making, manipulation, locomotion, and navigation. Safety is paramount throughout this evolution and remains the biggest barrier to mass adoption. Ultimately, robots like Digit will enter our homes, but that requires an even higher bar for safety and capability. As one expert put it, "Until you can prove that the robot is not going to fall on a baby, it's not going into the home." The warehouse is our proving ground, our practice arena for achieving that ultimate standard.
Our goal isn't to build a robot that looks like us. Instead, we aim for a useful robot that can operate effectively in our spaces. It’s easy to build a robot that resembles a person, but much more challenging to build one that can perform the same physical tasks that we can. Our world was designed for people, and we are finding that a human-centric, multi-purpose robot ends up with a roughly humanoid configuration. Regardless of its form, a robot is only as good as the useful tasks it can accomplish.
Critics would then ask what defines useful? In the most generic sense, it’s the ability for humanoids to complete work of a quality and reliability that a third party is willing to pay for, creating economic value for both parties. In other words, a good product-market-fit.
Our path to product-market-fit was not linear. We considered hundreds of use cases and engaged with almost that many potential customers to find the right set of problems that our uniquely capable technology could solve. We found that within the logistics and manufacturing industries, a significant labor shortage exists, created by an aging workforce and the disinterest of younger generations in taking on jobs that have become increasingly structured, repetitive, and robotic. Meanwhile, demand for next-day delivery has never been higher, pushing companies to do more and at a faster pace. All of this combines to create a labor gap that continues to grow.
Humanoid robots can address that need and provide a reliable source of labor. To do so, they must meet a deep list of requirements and performance milestones. There are, however, a few basic requirements that apply to almost every facility and bulk material handling task.
These have become the minimum requirements for light industrial work, and they heavily influence Digit’s design.
To operate in narrow footprints and lift heavy objects up high, a robot must be dynamically stable, much like humans and animals. Unlike statically stable robots, a dynamically balanced robot can shift its center of mass to accommodate loads without tipping over.
While wheels might seem simpler, achieving dynamic balancing on wheels (like a ballbot or Segway) is much less stable. While a foot can lift and step to a new location for balance, a wheel must accelerate in a line to the new location. If there is any impediment in that process because of a bump, or a slippery spot, or insufficient motor torque, the base can’t move back under the center of mass, and the robot will proceed to accelerate in the direction it’s falling. In addition, a wheeled base still requires a mechanism to bend down and pick things up, often approaching the complexity of a bipedal leg.
The perceived complexity of legs is largely a scientific problem that, once understood, becomes manageable. Our early research at Oregon State, particularly with the "ATRIAS" robot, demonstrated the ability to reproduce human walking dynamics and achieve remarkable robustness without complex sensing or planning. It was foundational work that made the complexity of legged loco-manipulation very doable.
Our early Cassie robot, essentially a bipedal "basketball," struggled with yaw control and stability. Adding an upright torso provided the ideal place to mount arms, and the combination provided several benefits:
Adding a "head" to the robot might seem purely aesthetic, but it serves practical and social purposes.
Industrial design plays a crucial role in how people perceive and interact with almost everything, and humanoid robots are no exception. These robots must be seen as helpful machines to empower people to do more than they ever could before, and that means overcoming images of less-friendly robots that people have seen in movies and media.
Ultimately, to meet the base requirement list, it requires the kind of dynamic stability and reach from a biped, an upright torso for height, sensor placement, inertial actuation, and a good location to mount arms. Those arms need to be bi-manual for manipulation, reachability, and fall protection. All of this combined leads to a humanoid form factor. Given that they are intended to work alongside people and enter our spaces, it’s crucial to incorporate a significant array of sensors for safety and to ensure they are visually appealing, communicating the expected social and physical cues. With all these requirements, these robots will be heavy (over 100 kg), tall (approximately 6 feet), and focused on practical tasks rather than acrobatics.
Now that we have a common understanding of what useful work entails, the conversation should shift to how we train and teach humanoids to perform that work. In addition to advances in battery technology, actuators, cameras, and sensors, artificial intelligence has propelled the industry forward. Most people think of AI in terms of ChatGPT and other generative tools that help leverage the collective written work of humanity to answer questions, provide information, or create new material.
While powerful, this is only one part of the puzzle for an embodied intelligence. Our approach recognizes a hierarchy of intelligence, differentiated by response speed and the amount of information required.
This hierarchical structure is analogous to how humans learn (e.g., playing the violin – requiring physical ability, basic instruction, extensive practice, and broad understanding). We believe in using the right tools at the right level of the hierarchy, and that there is no single "black box" AI solution. Robots will always be engineered systems, albeit with increasingly powerful AI tools.
To simplify it further, you can categorize the hierarchy into two groups: Semantic AI and Physical AI. Semantic intelligence, the planning and cognition layers, is fast becoming a commodity. Every major tech company is building and sharing them. Some are open source, and others have a cost, but we can use any of them, as can any other humanoid developer. Physical AI, in contrast, is more proprietary in nature, combining specialized hardware and the physics it enables with a dynamic control layer, and some overlapping innovation in the planning layer. The key to enabling intelligent robots will be physical AI.
No matter how capable a robot is, it cannot scale without being demonstrably safe around people. This isn't just about avoiding damage to the robot; it's about preventing injury to humans. Proving safety is a rigorous process that involves extensive hazard and risk assessments, which are validated by third parties such as TÜV Rheinland. This often involves detailed documentation and significant investment, eventually leading to industry regulations.
Our current strategy for safe human interaction involves the robot slowing down and adopting a lower-energy state as a person approaches, ensuring a safe physical contact. This "sit down" behavior allows for coexistence even before robots can operate in close proximity.
The journey to widespread in-home robotics will be a long one, but the commercial opportunities along the way are immense. Moving bins and totes in warehouses alone presents a multi-billion-dollar market. As robots become more capable and safer, they will gradually enter retail and grocery store rooms, then stock shelves in retail spaces (initially at night), and eventually become part of our daily lives, assisting us in countless ways.
The first commercial humanoids are big and heavy, with significant engineering focused on safety behaviors designed holistically into all corners of the hardware and software. They are designed for a few specific initial use cases, on the path to broad generality. In fact, doing it that way is the only possible path to generality. The future of humanoid robots is not just about technological advancement and demonstration, but about a thoughtful and deliberate integration into our world, prioritizing safety, utility, and human understanding.