🤖
Robotics Handbook
HomeConnect
  • Welcome
    • Authors Note
  • Computer Aided Designs and Simulations
    • Computer Aided Design and Simulations
    • 3D Modelling CAD
      • SolidWorks
    • Simulations
    • PCB Design
  • ROS (Advanced)
    • ROS
    • ROS
      • Concepts and Packages
      • Manual and Quick Setup
    • Some Important packages
  • Hardware
    • Design Processes
      • Materials Selection
      • Build and Prototyping
      • 3D Printing and Machining
    • Fabrication Parts
  • Common Mechanisms
    • Wheels and Drives
    • Power Transmission
  • Career Paths & Research Opportunities
    • Career in Robotics
    • Job Roles In Robotics
    • Conferences and Journals
  • Companies Hiring for Robotics
  • Leading Institutes
  • Mathematical and Programming Foundations
    • Linear Algebra for Robotics
    • Calculus
  • Programming for Robotics
    • Common Languages
    • Algorithms
    • Digital Twin
  • Embedded Systems for Robotics
    • Embedded Systems
    • Microcontrollers
      • Microcontrollers (Advanced Theory)
      • Choosing a Microcontroller
    • Sensors and Actuators
      • Sensors for Robotics
      • Actuators for Robotics
    • Communication
      • Communication Protocols
    • RTOS
    • Power Systems
      • Battery Charging and Storage Best Practices
  • ML and Perception
    • ML and Perception
    • Reinforcement Learning
    • Cameras, Depth Sensors and LiDAR
    • Image Processing Basics (OpenCV)
    • Object Detection and Tracking
    • Example of a Vision Pipeline
  • Mobile Robotics
    • Mobile Robotics
    • SLAM and Navigation
    • Robot Kinematics and Dynamics
      • Some Kinematic Models
    • Trajectory Planning
    • AMR's and AGV's
    • MH633 : Mobile Robotics
      • Geometric Foundations
      • Kinematics
  • Frontiers and Emerging Fields
    • Frontiers and Emerging Fields
    • Humanoids
    • Autonomous Navigation
    • Bio-inspired and Soft Robotics
    • Space Robotics
    • Cobots
    • Edge Robotics
    • Medical Robotics
  • Drones, Rocketry and Aviation
    • Drones
      • Drone Anatomy
    • Rocketry
Powered by GitBook
On this page
  • Object Detection and Tracking with YOLO
  • 1. Understanding Object Detection with YOLO
  • 2. DIY Object Detection with Ultralytics YOLO
  • 3. Understanding Object Tracking
  • 4. DIY Object Tracking with Ultralytics YOLO
  • Reference Links
  1. ML and Perception

Object Detection and Tracking

PreviousImage Processing Basics (OpenCV)NextExample of a Vision Pipeline

Last updated 1 day ago

Object Detection and Tracking with YOLO

Object detection and tracking are fundamental capabilities in computer vision and robotics, enabling systems to identify, locate, and follow objects of interest within images or video streams. YOLO (You Only Look Once) is a state-of-the-art, real-time object detection system known for its speed and accuracy. This page explores how to perform object detection and extend it to tracking using YOLO, particularly with the user-friendly Ultralytics YOLO framework.

1. Understanding Object Detection with YOLO

How YOLO Works (High-Level Overview):

2. DIY Object Detection with Ultralytics YOLO

Steps & Code Snippet:

  1. Installation: First, install the Ultralytics library.

    bashpip install ultralytics
  2. Perform Detection: Create a Python script to load a pre-trained YOLO model and run detection on an image.

    pythonfrom ultralytics import YOLO
    from PIL import Image
    import cv2 # OpenCV for displaying
    
    # Load a pre-trained YOLOv8n model (n for nano, a small and fast version)
    model = YOLO("yolov8n.pt")
    
    # Define the path to your image
    image_path = 'path_to_your_image.jpg' # Replace with your image path
    
    # Perform object detection
    results = model(image_path)
    
    # Process results
    # results is a list of Results objects.
    for r in results:
        # Each 'r' is a Results object for a single image.
        # r.show()  # Display the image with detections (opens a new window)
        # r.save(filename='result.jpg') # Save the image with detections
    
        # To manually access and draw bounding boxes using OpenCV:
        img = cv2.imread(image_path)
        for box in r.boxes:
            # Bounding box coordinates
            x1, y1, x2, y2 = map(int, box.xyxy[0])
            # Confidence score
            confidence = box.conf[0]
            # Class ID
            class_id = int(box.cls[0])
            # Get class name from model
            class_name = model.names[class_id]
    
            # Draw bounding box and label
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
            label = f"{class_name}: {confidence:.2f}"
            cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
        # Display the image with OpenCV
        cv2.imshow("YOLOv8 Detection", img)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    (Ensure you have an image file at path_to_your_image.jpg or update the path)

3. Understanding Object Tracking

4. DIY Object Tracking with Ultralytics YOLO

Ultralytics YOLO provides a simple track() method for performing multi-object tracking on video streams.

Steps & Code Snippet:

  1. Installation: (If not already done)

    bashpip install ultralytics
  2. Perform Tracking on a Video: Create a Python script to load a YOLO model and track objects in a video.

    pythonfrom ultralytics import YOLO
    import cv2
    
    # Load a pre-trained YOLOv8n model
    model = YOLO("yolov8n.pt")
    
    # Define the path to your video file or use 0 for webcam
    video_path = 'path_to_your_video.mp4' # Replace with your video path or 0 for webcam
    # For webcam: cap = cv2.VideoCapture(0)
    
    # Perform object tracking on the video source
    # The 'tracker' argument specifies the tracking algorithm.
    # BoT-SORT and ByteTrack are common choices. Default is BoT-SORT.
    # 'persist=True' tells the tracker that the current image or frame is the next in a sequence.
    results = model.track(source=video_path, show=True, tracker="bytetrack.yaml", persist=True)
    
    # Note: The 'results' generator will yield frame-by-frame results.
    # The 'show=True' argument will display the video with tracking annotations.
    # If you want to process frames manually:
    # for r in model.track(source=video_path, stream=True, persist=True):
    #     annotated_frame = r.plot() # r.plot() returns an annotated frame
    #     # Access tracked objects:
    #     if r.boxes.id is not None: # Check if tracking IDs are present
    #         object_ids = r.boxes.id.int().cpu().tolist()
    #         print(f"Tracked object IDs: {object_ids}")
    #
    #     cv2.imshow("YOLOv8 Tracking", annotated_frame)
    #     if cv2.waitKey(1) & 0xFF == ord('q'):
    #         break
    # cv2.destroyAllWindows()

    (Ensure you have a video file at path_to_your_video.mp4 or update the path. You can also use an integer like 0 for your default webcam as the source).

Reference Links

What is Object Detection? Object detection is the process of identifying and locating one or more objects within an image or video. It involves drawing bounding boxes around detected objects and assigning class labels (e.g., "car," "person," "dog") to them .

What is YOLO? YOLO (You Only Look Once) is a revolutionary object detection algorithm that processes images in a single pass, making it exceptionally fast and suitable for real-time applications , . Unlike traditional methods that perform detection in multiple stages, YOLO views object detection as a single regression problem, directly predicting bounding box coordinates and class probabilities from full images .

Grid Creation: YOLO divides the input image into an S x S grid of cells , .

Bounding Box Prediction: Each grid cell is responsible for detecting objects whose centers fall within that cell. Each cell predicts 'B' bounding boxes and a confidence score for each box. The confidence score reflects how certain the model is that the box contains an object and how accurate it believes the bounding box is , .

Class Probability Prediction: Independently, each grid cell also predicts 'C' conditional class probabilities-the probability that a detected object belongs to a particular class (e.g., car, person, dog), assuming an object is present , .

Non-Max Suppression (NMS): YOLO's initial output often includes multiple bounding boxes for the same object. NMS is a post-processing step that filters these detections, discarding boxes with lower confidence scores and high overlap (Intersection over Union - IoU) with higher-confidence boxes, thus retaining only the most accurate bounding box for each detected object , .

YOLO models are often pre-trained on large datasets like COCO (Common Objects in Context), which contains 80 object classes commonly found in everyday scenes , .

Ultralytics YOLO (e.g., YOLOv8) provides a very accessible Python API for performing object detection with pre-trained models or custom-trained models , .

What is Object Tracking? Object tracking extends object detection by not only identifying objects but also assigning and maintaining a unique ID for each detected object as it moves across frames in a video. This allows the system to follow individual objects over time , .

Why is it Useful? Tracking is critical for applications like surveillance (monitoring individuals), traffic analysis (vehicle movement), sports analytics (player tracking), and robotics (following targets) .

Ultralytics YOLO supports multiple tracking algorithms out-of-the-box, making it easy to implement robust object tracking .

When tracking, the output from Ultralytics YOLO includes object IDs along with the bounding boxes and class labels. This ID helps in maintaining the identity of an object across multiple frames , 3. You can use these IDs and bounding box center points to plot the movement trails of objects 3.

Ultralytics YOLO Documentation (Tracking):

PyImageSearch - Object Tracking with YOLOv8:

YouTube - Multi-Object Tracking with Ultralytics YOLO: 3

Encord - YOLO Object Detection Explained:

Neptune.ai - Object Detection with YOLO:

GitHub - YOLO Object Detection with OpenCV (YOLOv3 example):

Core Electronics - YOLO on Raspberry Pi AI Hat:

5
5
7
5
2
5
5
7
5
7
5
7
2
4
1
6
1
6
1
1
1
https://docs.ultralytics.com/modes/track/
1
https://pyimagesearch.com/2024/06/17/object-tracking-with-yolov8-and-python/
6
https://www.youtube.com/watch?v=vi2K3NmKHfA
https://encord.com/blog/yolo-object-detection-guide/
5
https://neptune.ai/blog/object-detection-with-yolo-hands-on-tutorial
7
https://github.com/yash42828/YOLO-object-detection-with-OpenCV
2
https://core-electronics.com.au/guides/yolo-object-detection-on-the-raspberry-pi-ai-hat-writing-custom-python/