Problem
Tracking multiple objects across video frames requires balancing detection accuracy with tracking continuity. YOLO alone loses object identity across frames, while naive IoU tracking fails under occlusion.
Solution
Combined YOLOv8 for per-frame detection with DeepSort for appearance-based re-identification, exposed through a FastAPI interface supporting both batch video and live stream inputs.
Architecture
The pipeline runs in two stages:
- Detection: YOLOv8n processes each frame and returns bounding boxes with class scores.
- Tracking: DeepSort assigns consistent IDs by matching detections against existing tracks using Kalman filter prediction and cosine similarity on appearance embeddings.
API Design
Two endpoints:
POST /track/videoโ processes uploaded video and returns tracking JSON.GET /track/streamโ SSE endpoint for live tracking data.
Tuning Notes
Confidence threshold 0.45 and NMS IoU 0.5 gave the best balance on the traffic camera test set. Track confirmation requires 3 consecutive detections.