Jetson-Based Autonomous Driving System

Overview

This project implements an embedded autonomous driving prototype on the NVIDIA Jetson platform. The system was designed as a complete closed loop:

camera/radar sensing -> perception -> scene understanding -> behavior decision -> steering and speed control

The goal was not only to run an object detector on an embedded GPU, but to build a small autonomous driving stack that connects perception outputs to real driving behaviors. The project therefore combines deep learning-based object detection, classical computer vision for lane perception, camera-radar fusion, rule-based planning, and low-latency control under embedded resource constraints.

This project helped me understand autonomous driving as a system problem: perception accuracy, latency, decision stability, sensor calibration, and control smoothness all interact with each other.

Project Thumbnail

Jetson-based autonomous driving system thumbnail

This thumbnail summarizes the project pipeline: camera input is processed by YOLOv5-based object detection and OpenCV-based lane detection, then a rule-based finite-state machine generates driving decisions, and the Jetson platform outputs steering and speed control commands in a closed loop.

The prototype supports:

real-time object detection and lane perception;
traffic-light and obstacle-aware behavior decisions;
camera-radar distance refinement;
steering and speed control in a closed loop;
deployment on an embedded GPU platform.

Motivation

Autonomous driving is often introduced as a perception task, but a working system requires much more than perception. A detector may recognize cars and traffic lights, yet the vehicle still needs to decide when to slow down, when to stop, how to keep the lane center, and how to react when perception is uncertain.

This project was motivated by three questions:

How can a real-time perception system be deployed on embedded hardware?
How can perception outputs be converted into interpretable driving decisions?
How can the system maintain stable control when detection, lane estimation, and distance measurements are noisy?

The resulting design favors modularity and interpretability. Each module has a clear input, output, and failure mode, making the behavior easier to debug than a monolithic end-to-end controller.

System Architecture

The system follows a modular architecture:

sensor input -> perception -> environment model -> behavior planning -> control output

At each control cycle:

capture RGB camera frames and radar measurements;
detect vehicles, pedestrians, traffic lights, and crosswalk-related cues;
estimate lane geometry and lateral offset from the lane center;
associate radar returns with visual detections when possible;
build a lightweight scene state for planning;
update the finite-state machine;
output steering and speed commands.

All modules run in real time on the Jetson onboard system.

The system is intentionally split into independent modules so that perception, planning, and control can be tested separately before integration.

Perception Module

Object Detection (YOLOv5)

The object detection module uses YOLOv5 to detect driving-relevant objects:

vehicles;
pedestrians;
traffic lights;
crosswalk-related objects and signs when available.

The model was trained with data augmentation such as brightness changes, scaling, and viewpoint variation. For embedded deployment, the inference path was optimized with TensorRT to reduce latency and improve frame rate on the Jetson GPU.

The detector outputs bounding boxes, class labels, confidence scores, and traffic-light state estimates. These outputs are then converted into symbolic scene facts for the decision module.

Lane Detection (Classical CV)

The lane perception module uses classical computer vision rather than a neural network, which makes it lightweight and interpretable on embedded hardware.

The pipeline includes:

perspective transformation to approximate a bird’s-eye view;
color and gradient filtering;
Canny edge detection;
Hough transform for lane candidate extraction;
polynomial fitting and temporal smoothing;
lane-center estimation.

The final output is the lateral deviation (e) between the vehicle center and the estimated lane center. This value is used directly by the steering controller.

Sensor Fusion (Camera + Radar)

To improve robustness, a camera-radar fusion module was implemented.

The two sensors provide complementary information:

camera: semantic understanding and object category;
radar: more reliable range measurement for obstacles.

Fusion steps include:

coordinate alignment between radar and camera frames;
projection of radar returns into the image plane;
association between radar returns and visual bounding boxes;
distance refinement for detected obstacles;
safety-region checking using fused object distance.

This fusion design improves obstacle localization in cases where monocular depth is unreliable, especially under lighting changes or when the object scale is ambiguous.

Behavior Planning (Core Contribution)

A finite-state machine (FSM) was designed to translate perception outputs into stable driving behaviors. This was the main system-level contribution of the project.

Scene State

The planner receives a compact scene state:

lane offset and lane confidence;
traffic-light state;
obstacle distance and category;
crosswalk or pedestrian warning;
current vehicle state.

Decision Rules

Rules are organized around safety and driving priority:

red traffic light -> stop;
green light and clear safety region -> proceed;
obstacle inside warning region -> slow down;
obstacle inside stop region -> stop;
crosswalk or pedestrian cue -> reduce speed;
valid lane estimate -> maintain lane center;
low-confidence lane estimate -> reduce speed and keep conservative steering.

FSM Design

The system defines states such as:

Lane Following
Traffic-Light Stop
Obstacle Slowdown
Obstacle Stop
Crosswalk Slowdown
Recovery / Low Confidence

Transitions are triggered by perception outputs, ensuring:

stable behavior transitions;
interpretable logic;
reduced oscillation between actions;
real-time responsiveness.

The FSM also includes simple hysteresis-like logic to avoid rapid switching when detection confidence fluctuates around a threshold.

Control Module

Steering Control

A proportional controller is used for steering:

[ \delta = k_p \cdot e ]

(e): lateral deviation from the lane center;
(k_p): proportional gain;
(\delta): steering command.

The controller is intentionally simple, which makes it stable and easy to tune for an embedded prototype.

Speed Control

Speed is determined by the active behavior state:

normal speed for lane following;
reduced speed for crosswalk and warning-region obstacles;
zero speed for red lights and stop-region obstacles;
conservative speed when perception confidence is low.

Embedded Optimization

To achieve real-time performance on Jetson:

TensorRT acceleration was used for YOLOv5 inference;
input resolution was selected to balance accuracy and latency;
perception and control were separated into asynchronous loops;
image preprocessing was optimized to reduce CPU overhead;
lightweight classical vision was used for lane detection;
logging and visualization were separated from the real-time path.

The optimization process made the system responsive enough for closed-loop testing while preserving the modularity needed for debugging.

Technical Stack

Hardware

NVIDIA Jetson embedded GPU;
monocular RGB camera;
millimeter-wave radar;
vehicle control interface.

Software

Python;
PyTorch;
TensorRT;
OpenCV.

Algorithms

YOLOv5 object detection;
classical lane detection;
camera-radar association;
rule-based planning and FSM;
proportional steering control.

Outcome

The system demonstrates a complete embedded autonomous driving pipeline capable of:

real-time perception on embedded hardware;
lane following based on image geometry;
traffic-light and obstacle-aware behavior decisions;
improved distance awareness through radar fusion;
interpretable FSM-based planning;
closed-loop control under resource constraints.

Key Takeaways

Built a full perception-decision-control system from scratch.
Learned how embedded constraints shape perception and planning choices.
Practiced the integration of deep learning, classical vision, sensor fusion, and control.
Developed a stronger understanding of autonomous driving as an end-to-end system rather than an isolated perception task.

This project marked my transition from general computer vision toward autonomous driving systems, and later motivated my research in multi-agent perception, semantic occupancy prediction, and communication-efficient collaboration.