Researcher(s)
- Nova Fadadu, Mechanical Engineering, University of Delaware
Faculty Mentor(s)
- Gregory Chirikjian, Mechanical Engineering, University of Delaware
Abstract
Object detection is a core task in computer vision with broad applications, from autonomous vehicles to surveillance. Among the most widely adopted models for this task is YOLO (You Only Look Once). The YOLOv11 family includes five variants, n, s, m, l, and x which differ in size, speed, and accuracy. While these models are commonly bench marked on the full COCO2017 dataset, training across all versions is computationally expensive and time-consuming. This project aims to compare the performance trade-offs across YOLOv11 model variants using the smaller COCO128 dataset to enable practical experimentation and analysis.
The central question addressed is: How does model size in YOLOv11 affect accuracy and inference time when using a constrained dataset such as COCO128? Previous research has shown that smaller models offer faster inference but lower detection accuracy, while larger models are more accurate but with longer inference times. This study uses a controlled training environment on Google Colab, applying consistent image size, batch size, and epochs across models. Metrics such as mean Average Precision (mAP) and inference speed are collected and analyzed.
Findings reveal predictable scaling behavior: smaller models (n, s) are more efficient but less accurate, while larger models (l, x) achieve higher mAP with increased computational demands. Though COCO128 is limited in complexity, the results offer practical insight into model selection for developers working with resource-constrained systems. This work highlights the importance of balancing accuracy and efficiency in real-world object detection deployment.