Lecture

Object Detection & Tracking

03 Jun 2026

Datasets

VOC (Visual Ojbect Classes)

VOC2012 : 20 classes, 530張標註圖片，有27,450個ROI 標註物件

COCO Dataset

80 classes, 330K images, 1.5M object instances

ImageNet

1000 object classes, 1,281,167 training images, 50,000 validation images and 100,000 test images

Open Images Dataset

Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives

Roboflow

https://universe.roboflow.com

Label tools

labelme

pip install labelme
labelme pic123.jpg

Labelme2YOLO

pip install labelme2yolo

Convert JSON files, split training and validation dataset by –val_size
python labelme2yolo.py --json_dir /home/username/labelme_json_dir/ --val_size 0.2

LabelImg

pip install labelImg

labelImg
labelImg [IMAGE_PATH] [PRE-DEFINED CLASS FILE]

VOC .xml convert to YOLO .txt

cd ~/tf/raccoon/annotations
python ~/tf/xml2yolo.py

YOLO Annotation formats (.txt)

class_num x, y, w, h

0 0.5222826086956521 0.5518115942028986 0.025 0.010869565217391304
0 0.5271739130434783 0.5057971014492754 0.013043478260869565 0.004347826086956522

Object Detection

R-CNN, Fast R-CNN, Faster R-CNN

目標檢測

R-CNN (2013)

Fast R-CNN (2015)

Faster R-CNN (2015)

[物件偵測] S3: Faster R-CNN 簡介](https://ivan-eng-murmur.medium.com/object-detection-s3-faster-rcnn-%E7%B0%A1%E4%BB%8B-5f37b13ccdd2)

Mask R-CNN (2017)

Code: https://github.com/matterport/Mask_RCNN/

SSD (2015)

RetinaNet

EfficientDet [(2019)]((https://arxiv.org/abs/1911.09070)

Kaggle: efficientdet-gwd

YOLO- You Only Look Once

Code: https://github.com/pjreddie/darknet

YOLOv1 (2015)

YOLOv2 (2016)

YOLOv3 (2018) : Darknet-53 + FPN

YOLOv4

Code: AlexeyAB/darknet

YOLOv4 = YOLOv3 + CSPDarknet53 + SPP + PAN + BoF + BoS

YOLOv5

Scaled-YOLOv4

YOLOR

YOLOX (2021)

Code: Megvii-BaseDetection/YOLOX

CSL-YOLO

YOLOv6 (2022)

YOLOv7 (2022)

YOLOv8 (2023)

Kaggle: rkuo2000/YOLOv8, rkuo2000/YOLOv8-Pothole-detection

UAV-YOLOv8

[YOLOv8 Aerial Sheep Detection and Counting]((https://github.com/monemati/YOLOv8-Sheep-Detection-Counting)

YOLOv8 Drone Surveillance

YOLOv9 (2024)

YOLOv10 (2024)

YOLOv11 (2024)

YOLOv12 (2025)

Kaggle: yolov12, yolov12-tank, yolov12-face

YOLOv13 (2025)

YOLO26 (2026)

Kaggle: https://www.kaggle.com/code/rkuo2000/yolo26

RF-DETR (2023)

RF-DETR: SOTA Real-Time Detection and Segmentation Model

Blog: How to Deploy RF-DETR to an NVIDIA Jetson

RF-DETRv2 (2024)

Datasets

Deep-sea Debris (2018)

TACO (2020)

UDD (2020)

Concretely, UDD consists of 3 categories (seacucumber, seaurchin, and scallop) with 2,227 images

DUO (2021)

Applications

Satellite Image Deep Learning

Swimming Pool Detection

Identify Military Vehicles in Satellite Imagery

Dataset: Moving and Stationary Target Acquisition and Recognition (MSTAR) Dataset

Code: Target Recognition in Sythentic Aperture Radar Imagery Using Deep Learning script.ipynb

BCCD Dataset

3 classes: RBC (Red Blood Cell), WBC (White Blood Cell), Platelets (血小板)

YOLOv5 FaceMask Detection

YOLOv5 Traffic Analysis

EfficientDet - Global Wheat Detection

Mask-RCNN

Mask RCNN transfer learning

Objectron

OpenCV-Python play GTA5

Steel Defect Detection

Dataset: Severstal: Steel Defect Detection

https://www.kaggle.com/code/jaysmit/u-net (Keras UNet)

Pothole Detection using YOLOv4

Kaggle: YOLOv7 Pothole Detection

YOLOv7 Braking Detection

Steel Defect Detection

Dataset: Severstal: Steel Defect Detection

Steel Defect Detection using UNet

https://www.kaggle.com/code/jaysmit/u-net (Keras UNet)

https://www.kaggle.com/code/myominhtet/steel-defection (pytorch UNet

Steel-Defect Detection Using CNN

MSFT-YOLO (2022)

PCB Datasets

PCB Defect Detection

Paper: PCB Defect Detection Using Denoising Convolutional Autoencoders

PCB Defect Classification

Dataset: HRIPCB dataset (dropbox)

印刷电路板（PCB）瑕疵数据集。它是一个公共合成PCB数据集，包含1386张图像，具有6种缺陷（漏孔、鼠咬、开路、短路、杂散、杂铜），用于图像检测、分类和配准任务。

Paper: End-to-end deep learning framework for printed circuit board manufacturing defect classification