NAO Robot Autonomous Ball Search and Kick System
Embodied AI system for autonomous perception, decision-making, and control on NAO humanoid robots
Overview
This project implements a complete embodied AI system on a NAO humanoid robot, enabling the robot to autonomously search for a ball, approach it, align its body, and execute a kicking motion.
The system is designed as a closed-loop pipeline:
visual perception -> target localization -> behavior planning -> humanoid motion control
The project integrates real-time computer vision, finite-state behavior planning, head scanning, walking control, and kicking motion execution. Compared with a static vision demo, the main challenge is that the robot must continuously act, observe the consequence of its action, and update its decision in a dynamic physical environment.
This project gave me an early and concrete understanding of embodied intelligence: perception is useful only when it can support reliable action.
Project Thumbnail
This thumbnail summarizes the closed-loop embodied AI pipeline: the NAO robot captures images from its onboard camera, detects and localizes the ball, estimates the relative position, uses a finite-state machine to decide behavior, and executes walking, alignment, or kicking actions through NAOqi-based motion control.
Key capabilities include:
- real-time ball detection and localization;
- active visual search through head movement;
- autonomous approach and alignment;
- behavior control via finite-state machine;
- humanoid walking and kicking through NAOqi control.
Motivation
Robotics systems are difficult because perception, decision-making, and control are tightly coupled. A small perception error can lead to a wrong movement, and a wrong movement changes the next observation. This project was designed to practice the full loop rather than only one isolated module.
The task is simple enough to be testable but rich enough to involve the key components of an embodied system:
- the robot must detect the target;
- it must reason about relative position;
- it must search when the target is lost;
- it must approach without overshooting;
- it must align before kicking;
- it must execute motion while maintaining stability.
The resulting system uses a modular and interpretable design, which makes it easier to debug on real hardware.
System Architecture
The system operates as a continuous perception–action loop:
- capture image from the onboard RGB camera;
- detect the ball in image space;
- estimate the ball’s relative direction and distance;
- update the behavior state;
- execute head scanning, walking, alignment, or kicking;
- observe the new scene and repeat the loop.
This enables the robot to autonomously interact with its environment.
Perception Module
Ball Detection (Lightweight Vision Pipeline)
The perception module uses a lightweight classical vision pipeline, which is suitable for NAO’s onboard computation and real-time control requirements.
The pipeline includes:
- conversion from RGB to HSV color space for better color robustness;
- color thresholding to segment the ball region;
- morphological filtering to remove noise;
- contour extraction;
- filtering by area, circularity, and position;
- temporal smoothing of detected ball centers.
The final output is the ball center position in image coordinates and a confidence estimate indicating whether the ball is currently visible.
Distance Estimation
Distance is estimated using a simplified pinhole camera model. Given the observed ball radius in the image, the approximate distance can be estimated as:
[ Z \approx \frac{fR}{r} ]
where:
- (f) is the focal length;
- (R) is the physical ball radius;
- (r) is the observed image radius;
- (Z) is the estimated distance.
The image-space horizontal offset provides the relative heading direction. Together, distance and heading are sufficient for approach, alignment, and kicking decisions.
Behavior Planning (Core Contribution)
A finite-state machine (FSM) was designed to control robot behavior. The FSM makes the system interpretable and robust to intermittent perception failures.
Main States
- Search: scan the environment when the ball is not visible;
- Track: keep the ball near the image center using head and body adjustment;
- Approach: move toward the ball while maintaining heading alignment;
- Alignment: adjust body position before kicking;
- Kick: execute the kicking motion;
- Recovery: stop or re-search when the ball is lost during approach.
State Transitions
Transitions are triggered by:
- ball visibility;
- estimated distance;
- horizontal image offset;
- confidence of detection;
- completion of walking or kicking motion.
This design ensures:
- stable behavior transitions;
- interpretable decision logic;
- robustness in dynamic environments;
- easy debugging on real robot hardware.
The FSM prevents unstable behavior such as kicking too early, walking while the ball is lost, or oscillating between search and approach states.
Motion Control
Locomotion Control
Locomotion is controlled through the NAOqi API, including setWalkTargetVelocity. Forward velocity and angular velocity are adjusted based on the estimated ball position.
A simple proportional rule is used for heading correction:
[ \omega = k_\theta \cdot \Delta x ]
where (\Delta x) is the horizontal offset between the ball center and the image center.
This allows the robot to turn toward the ball while moving forward.
Head Control
The robot actively scans the environment using head yaw control. When the ball is not visible, the robot sweeps its head across a range of yaw angles. If the ball is detected, the head tracks the ball to keep it near the center of the image.
This active perception loop improves recovery when the target is temporarily lost.
Kicking Control
The kicking action is implemented using predefined joint trajectories. Hip, knee, and ankle movements are coordinated to produce a stable kick while preserving balance.
The kick is executed only when the FSM confirms:
- the ball is close enough;
- the ball is approximately aligned with the kicking leg;
- the robot body orientation is stable.
Technical Stack
Hardware
- NAO humanoid robot;
- onboard RGB camera;
- IMU and joint encoders.
Software
- NAOqi framework;
- Python and C++ APIs.
Computer Vision
- HSV segmentation;
- morphological filtering;
- contour detection;
- geometric distance estimation.
Algorithms
- finite-state machine;
- proportional heading control;
- active visual search;
- humanoid walking and kicking control.
Outcome
The system enables the robot to:
- detect and track the ball in real time;
- actively search when the ball is not visible;
- approach the ball while correcting body orientation;
- align before kicking;
- execute stable kicking actions.
Key Takeaways
- Built a complete embodied AI system that connects perception, decision, and control.
- Designed and implemented FSM-based behavior planning on real robot hardware.
- Learned how sensing uncertainty affects motion decisions.
- Gained hands-on experience with debugging closed-loop robotic behavior.
This project provided early exposure to closed-loop intelligent systems, which later influenced my research interests in autonomous driving, embodied perception, and multi-agent perception systems.