Humanoid Robots: From the Warehouse to Your House

Understanding the journey to embodied AI

Published on

July 15, 2025

When we started Agility Robotics 10 years ago, we told people that humanoid robots would be a part of everyday life. At the time, this was a bold prediction, often met with skepticism. Many saw it as a distant possibility, a 50-year venture. But the world is recognizing that the time is now. The scale is truly massive, projected to be larger than the automotive industry, potentially reaching trillions of dollars. The growing surge of new and old businesses entering the field, along with significant investment, reflects an increasing understanding of the transformative potential of embodied AI.

The beachhead for humanoid robots is semi-structured industrial settings, such as logistics and manufacturing. From there, they will rapidly expand into less structured environments such as grocery and retail stock rooms. After that, they will roam the floors stocking shelves, and out into construction, health care, and the industries that demand greater generality in their perception, intelligence, decision-making, manipulation, locomotion, and navigation. Safety is paramount throughout this evolution and remains the biggest barrier to mass adoption. Ultimately, robots like Digit will enter our homes, but that requires an even higher bar for safety and capability. As one expert put it, "Until you can prove that the robot is not going to fall on a baby, it's not going into the home." The warehouse is our proving ground, our practice arena for achieving that ultimate standard.

As humanoid robots clear major safety hurdles they will expand to industries that require closer interaction with people, in less structured environments.

Designing a Useful Humanoid: Form Follows Function

Our goal isn't to build a robot that looks like us. Instead, we aim for a useful robot that can operate effectively in our spaces. It’s easy to build a robot that resembles a person, but much more challenging to build one that can perform the same physical tasks that we can. Our world was designed for people, and we are finding that a human-centric, multi-purpose robot ends up with a roughly humanoid configuration. Regardless of its form, a robot is only as good as the useful tasks it can accomplish.

Critics would then ask what defines useful? In the most generic sense, it’s the ability for humanoids to complete work of a quality and reliability that a third party is willing to pay for, creating economic value for both parties. In other words, a good product-market-fit.

Our path to product-market-fit was not linear. We considered hundreds of use cases and engaged with almost that many potential customers to find the right set of problems that our uniquely capable technology could solve. We found that within the logistics and manufacturing industries, a significant labor shortage exists, created by an aging workforce and the disinterest of younger generations in taking on jobs that have become increasingly structured, repetitive, and robotic. Meanwhile, demand for next-day delivery has never been higher, pushing companies to do more and at a faster pace. All of this combines to create a labor gap that continues to grow.

Humanoid robots can address that need and provide a reliable source of labor. To do so, they must meet a deep list of requirements and performance milestones. There are, however, a few basic requirements that apply to almost every facility and bulk material handling task.

These have become the minimum requirements for light industrial work, and they heavily influence Digit’s design.

Base Requirements

Lifting capacity: 25 kilograms (meeting OSHA limits).
Footprint: The ability to operate in narrow aisles.
Reach: Pick up objects from the floor and lift them over six feet high.‍
Battery life: At least 90 minutes between self-charges on its docking station.

The Case for a Biped

To operate in narrow footprints and lift heavy objects up high, a robot must be dynamically stable, much like humans and animals. Unlike statically stable robots, a dynamically balanced robot can shift its center of mass to accommodate loads without tipping over.

While wheels might seem simpler, achieving dynamic balancing on wheels (like a ballbot or Segway) is much less stable. While a foot can lift and step to a new location for balance, a wheel must accelerate in a line to the new location. If there is any impediment in that process because of a bump, or a slippery spot, or insufficient motor torque, the base can’t move back under the center of mass, and the robot will proceed to accelerate in the direction it’s falling. In addition, a wheeled base still requires a mechanism to bend down and pick things up, often approaching the complexity of a bipedal leg.

The perceived complexity of legs is largely a scientific problem that, once understood, becomes manageable. Our early research at Oregon State, particularly with the "ATRIAS" robot, demonstrated the ability to reproduce human walking dynamics and achieve remarkable robustness without complex sensing or planning. It was foundational work that made the complexity of legged loco-manipulation very doable.

Why a Torso & Arms?

Our early Cassie robot, essentially a bipedal "basketball," struggled with yaw control and stability. Adding an upright torso provided the ideal place to mount arms, and the combination provided several benefits:

Startup and stopping: A torso allows for leaning forward or backward to initiate and halt movement, rather than relying on stepping back.
Inertial actuation: Arms mounted on a torso can be used for effective yaw and pitch control, similar to how people windmill their arms for balance and swing their arms while walking or running.
Fall protection: Arms can extend to decelerate a fall, protecting internal components.‍
Manipulation: One of the primary goals of adding arms is to perform practical work, and a pair of arms mounted on either side near the top of the torso is ideal for reachability and bimanual capability.

Does it need a head?

Adding a "head" to the robot might seem purely aesthetic, but it serves practical and social purposes.

Sensor placement: Essential sensors, antennas, and cameras find a natural home here, providing an elevated and clear line of sight over the shoulders and other obstacles.
Intuitive interface: A simple, expressive "face" can communicate the robot's status, direction of movement, and overall awareness, preventing surprises and building trust with human co-workers.

Industrial design plays a crucial role in how people perceive and interact with almost everything, and humanoid robots are no exception. These robots must be seen as helpful machines to empower people to do more than they ever could before, and that means overcoming images of less-friendly robots that people have seen in movies and media.

Size

Ultimately, to meet the base requirement list, it requires the kind of dynamic stability and reach from a biped, an upright torso for height, sensor placement, inertial actuation, and a good location to mount arms. Those arms need to be bi-manual for manipulation, reachability, and fall protection. All of this combined leads to a humanoid form factor. Given that they are intended to work alongside people and enter our spaces, it’s crucial to incorporate a significant array of sensors for safety and to ensure they are visually appealing, communicating the expected social and physical cues. With all these requirements, these robots will be heavy (over 100 kg), tall (approximately 6 feet), and focused on practical tasks rather than acrobatics.

Embodied AI

Now that we have a common understanding of what useful work entails, the conversation should shift to how we train and teach humanoids to perform that work. In addition to advances in battery technology, actuators, cameras, and sensors, artificial intelligence has propelled the industry forward. Most people think of AI in terms of ChatGPT and other generative tools that help leverage the collective written work of humanity to answer questions, provide information, or create new material.

While powerful, this is only one part of the puzzle for an embodied intelligence. Our approach recognizes a hierarchy of intelligence, differentiated by response speed and the amount of information required.

Physics (Fast, Low Information): This foundational layer deals with immediate, high-bandwidth interactions like impacts. It requires a deep understanding of hardware design, actuation, and compliance. Our robots' ability to maintain force control even when encountering unexpected contact with the ground or other objects is a testament to the value of the right hardware and how the physics layer enables function.
Controls (Medium-Fast, Medium Information): This layer coordinates individual joints for balance and whole-body control. Reinforcement learning in simulation is crucial here, as there is no pre-existing data for a robot's unique dynamics. Behaviors such as robust balancing and self-righting are learned through extensive exploration in simulated environments and can also be refined on the physical robot to learn the physical interaction details that are difficult to simulate.
Planning (Medium-Slow, Medium-High Information): This layer involves tasks such as grasping objects or navigating complex environments. This is where learning by demonstration, motion capture, and teleoperation become valuable, providing starting points for the robot to refine its plans. Our robot, for example, can plan how to grasp and move a tote, executing the multi-step process autonomously once the high-level goal is set.
Cognition (Slow, High Information): This is the semantic AI, leveraging large language models and vision models to understand human context, interpret commands, and make high-level decisions. We can integrate language and perception models to allow our robot to understand complex requests like "clean up this mess."‍
Coordination (Slow, High Information): As fleets of robots scale, coordination between humanoids, AMRs, warehouse management systems, and human workers becomes increasingly essential. Our cloud-based software Agility ARC facilitates this and provides a platform for fleet management.

This hierarchical structure is analogous to how humans learn (e.g., playing the violin – requiring physical ability, basic instruction, extensive practice, and broad understanding). We believe in using the right tools at the right level of the hierarchy, and that there is no single "black box" AI solution. Robots will always be engineered systems, albeit with increasingly powerful AI tools.

To simplify it further, you can categorize the hierarchy into two groups: Semantic AI and Physical AI. Semantic intelligence, the planning and cognition layers, is fast becoming a commodity. Every major tech company is building and sharing them. Some are open source, and others have a cost, but we can use any of them, as can any other humanoid developer. Physical AI, in contrast, is more proprietary in nature, combining specialized hardware and the physics it enables with a dynamic control layer, and some overlapping innovation in the planning layer. The key to enabling intelligent robots will be physical AI.

Safety: The Key to Scaling

No matter how capable a robot is, it cannot scale without being demonstrably safe around people. This isn't just about avoiding damage to the robot; it's about preventing injury to humans. Proving safety is a rigorous process that involves extensive hazard and risk assessments, which are validated by third parties such as TÜV Rheinland. This often involves detailed documentation and significant investment, eventually leading to industry regulations.

Safety Considerations:

Falling is unavoidable: Dynamically stable robots will fall. The design must ensure that these falls do not cause injury.
Judgment errors: In unstructured environments, robots must avoid errors that could lead to harm (e.g., pouring hot tea on someone or failing to detect that a pet has crawled into the clothes dryer). Proving the absence of such errors is incredibly challenging.
Force control: Robots strong enough to do useful work are also strong enough to cause harm. Unlike bolted-down industrial cobots, balancing robots cannot simply limit force output.
Detecting people: Accurate and reliable human detection is crucial, even distinguishing between a person and a mannequin. This requires an "epic array of sensors" (RGB, IR, LIDAR, depth cameras).
Supervisory systems: An independent system on the robot must monitor the robot’s behavior and safely intervene if necessary.
Falling with a payload: Robots must be able to fall safely, even while carrying objects, and avoid hazards like stairs.‍
AI trust: Ultimately, trust in AI models will come from millions of hours of testing in deployed environments.

Our current strategy for safe human interaction involves the robot slowing down and adopting a lower-energy state as a person approaches, ensuring a safe physical contact. This "sit down" behavior allows for coexistence even before robots can operate in close proximity.

The Path

The journey to widespread in-home robotics will be a long one, but the commercial opportunities along the way are immense. Moving bins and totes in warehouses alone presents a multi-billion-dollar market. As robots become more capable and safer, they will gradually enter retail and grocery store rooms, then stock shelves in retail spaces (initially at night), and eventually become part of our daily lives, assisting us in countless ways.

The first commercial humanoids are big and heavy, with significant engineering focused on safety behaviors designed holistically into all corners of the hardware and software. They are designed for a few specific initial use cases, on the path to broad generality. In fact, doing it that way is the only possible path to generality. The future of humanoid robots is not just about technological advancement and demonstration, but about a thoughtful and deliberate integration into our world, prioritizing safety, utility, and human understanding.

‍

jonathan hurst — Jonathan Hurst, Chief Robot Officer & Co-Founder

The Agility newsletter

No spam. Just occasional updates, and exclusive interviews from the front lines of building the world's most advanced humanoid robot.

Thank you! You will get the latest updates straight to your inbox.

Oops! Something went wrong while submitting the form.

Agility in the press.