NAO Robot Autonomous Ball Search and Kick System

Overview

This project implements a complete embodied AI system on a NAO humanoid robot, enabling the robot to autonomously search for a ball, approach it, align its body, and execute a kicking motion.

The system is designed as a closed-loop pipeline:

visual perception -> target localization -> behavior planning -> humanoid motion control

The project integrates real-time computer vision, finite-state behavior planning, head scanning, walking control, and kicking motion execution. Compared with a static vision demo, the main challenge is that the robot must continuously act, observe the consequence of its action, and update its decision in a dynamic physical environment.

This project gave me an early and concrete understanding of embodied intelligence: perception is useful only when it can support reliable action.

Project Thumbnail

NAO robot autonomous ball search and kick system thumbnail

This thumbnail summarizes the closed-loop embodied AI pipeline: the NAO robot captures images from its onboard camera, detects and localizes the ball, estimates the relative position, uses a finite-state machine to decide behavior, and executes walking, alignment, or kicking actions through NAOqi-based motion control.

Key capabilities include:

real-time ball detection and localization;
active visual search through head movement;
autonomous approach and alignment;
behavior control via finite-state machine;
humanoid walking and kicking through NAOqi control.

Motivation

Robotics systems are difficult because perception, decision-making, and control are tightly coupled. A small perception error can lead to a wrong movement, and a wrong movement changes the next observation. This project was designed to practice the full loop rather than only one isolated module.

The task is simple enough to be testable but rich enough to involve the key components of an embodied system:

the robot must detect the target;
it must reason about relative position;
it must search when the target is lost;
it must approach without overshooting;
it must align before kicking;
it must execute motion while maintaining stability.

The resulting system uses a modular and interpretable design, which makes it easier to debug on real hardware.

System Architecture

The system operates as a continuous perception–action loop:

capture image from the onboard RGB camera;
detect the ball in image space;
estimate the ball’s relative direction and distance;
update the behavior state;
execute head scanning, walking, alignment, or kicking;
observe the new scene and repeat the loop.

This enables the robot to autonomously interact with its environment.

Perception Module

Ball Detection (Lightweight Vision Pipeline)

The perception module uses a lightweight classical vision pipeline, which is suitable for NAO’s onboard computation and real-time control requirements.

The pipeline includes:

conversion from RGB to HSV color space for better color robustness;
color thresholding to segment the ball region;
morphological filtering to remove noise;
contour extraction;
filtering by area, circularity, and position;
temporal smoothing of detected ball centers.

The final output is the ball center position in image coordinates and a confidence estimate indicating whether the ball is currently visible.

Distance Estimation

Distance is estimated using a simplified pinhole camera model. Given the observed ball radius in the image, the approximate distance can be estimated as:

[ Z \approx \frac{fR}{r} ]

where:

(f) is the focal length;
(R) is the physical ball radius;
(r) is the observed image radius;
(Z) is the estimated distance.

The image-space horizontal offset provides the relative heading direction. Together, distance and heading are sufficient for approach, alignment, and kicking decisions.

Behavior Planning (Core Contribution)

A finite-state machine (FSM) was designed to control robot behavior. The FSM makes the system interpretable and robust to intermittent perception failures.

Main States

Search: scan the environment when the ball is not visible;
Track: keep the ball near the image center using head and body adjustment;
Approach: move toward the ball while maintaining heading alignment;
Alignment: adjust body position before kicking;
Kick: execute the kicking motion;
Recovery: stop or re-search when the ball is lost during approach.

State Transitions

Transitions are triggered by:

ball visibility;
estimated distance;
horizontal image offset;
confidence of detection;
completion of walking or kicking motion.

This design ensures:

stable behavior transitions;
interpretable decision logic;
robustness in dynamic environments;
easy debugging on real robot hardware.

The FSM prevents unstable behavior such as kicking too early, walking while the ball is lost, or oscillating between search and approach states.

Motion Control

Locomotion Control

Locomotion is controlled through the NAOqi API, including setWalkTargetVelocity. Forward velocity and angular velocity are adjusted based on the estimated ball position.

A simple proportional rule is used for heading correction:

[ \omega = k_\theta \cdot \Delta x ]

where (\Delta x) is the horizontal offset between the ball center and the image center.

This allows the robot to turn toward the ball while moving forward.

Head Control

The robot actively scans the environment using head yaw control. When the ball is not visible, the robot sweeps its head across a range of yaw angles. If the ball is detected, the head tracks the ball to keep it near the center of the image.

This active perception loop improves recovery when the target is temporarily lost.

Kicking Control

The kicking action is implemented using predefined joint trajectories. Hip, knee, and ankle movements are coordinated to produce a stable kick while preserving balance.

The kick is executed only when the FSM confirms:

the ball is close enough;
the ball is approximately aligned with the kicking leg;
the robot body orientation is stable.

Technical Stack

Hardware

NAO humanoid robot;
onboard RGB camera;
IMU and joint encoders.

Software

NAOqi framework;
Python and C++ APIs.

Computer Vision

HSV segmentation;
morphological filtering;
contour detection;
geometric distance estimation.

Algorithms

finite-state machine;
proportional heading control;
active visual search;
humanoid walking and kicking control.

Outcome

The system enables the robot to:

detect and track the ball in real time;
actively search when the ball is not visible;
approach the ball while correcting body orientation;
align before kicking;
execute stable kicking actions.

Key Takeaways

Built a complete embodied AI system that connects perception, decision, and control.
Designed and implemented FSM-based behavior planning on real robot hardware.
Learned how sensing uncertainty affects motion decisions.
Gained hands-on experience with debugging closed-loop robotic behavior.

This project provided early exposure to closed-loop intelligent systems, which later influenced my research interests in autonomous driving, embodied perception, and multi-agent perception systems.