> INITIALIZING_SYSTEM _
root@jules : ~/workspace/logs/rolid-11k-wacvw-en.md

RoLID-11K:The First Large-Scale Dashcam Dataset for Roadside Litter Detection

DATE: 3/14/2026 DIR: /深度学习
#Python#深度学习#Deep Learning#WACV

This research was published at the WACV 2026 Workshop.


Roadside litter poses environmental, safety, and economic challenges, yet current monitoring relies heavily on labor-intensive surveys and public reporting, providing limited spatial coverage. To address this gap, we introduce RoLID-11K, the first large-scale dataset specifically designed for roadside litter detection from dashcams.

Why RoLID-11K?

Existing vision datasets for litter detection focus on street-level still images, aerial scenes, or aquatic environments. They do not reflect the unique characteristics of dashcam footage, where litter appears extremely small, sparse, and embedded in cluttered road-verge backgrounds.

In contrast, dash cameras are inexpensive and widely used. Their ubiquity presents a practical opportunity for passive roadside-litter monitoring using video that is already being recorded, offering a highly scalable and low-cost solution.

Mobile Data Acquisition Platform Figure 1: Overview of the RoLID-11K dataset. A vehicle-mounted dashcam serves as a mobile data acquisition platform, capturing roadside litter under diverse real-world driving conditions.

Dataset Highlights & Challenges

RoLID-11K comprises over 11,000 annotated images spanning diverse UK driving conditions (rural roads, suburban streets, dual carriageways, and urban settings) across various weather and lighting environments. It presents severe challenges for object detection models:

  1. Extreme Long-Tail Distribution: Most images contain only one to three instances of litter.
  2. Small-Object Dominance: Following COCO evaluation criteria, a staggering 86.8% of the annotated objects in the test set are classified as small (area < 322px232^2 px^2).
  3. Spatial Distribution Bias: Since driving in the UK is on the left, litter tends to accumulate on the left verge due to driver behavior and wind-driven displacement, creating a strong spatial bias.

Benchmark Results: Accuracy vs. Efficiency

To evaluate performance under these demanding conditions, we benchmarked a broad spectrum of modern detectors, ranging from accuracy-oriented transformer architectures to real-time YOLO models.

Accuracy-Oriented Transformers

MethodBackboneAP50AP_{50}AP50:95AP_{50:95}AP50:95smallAP_{50:95}^{small}AP50:95mediumAP_{50:95}^{medium}AP50:95largeAP_{50:95}^{large}
CO-DETRResNet-5079.232.131.237.540.0
DINOResNet-5078.531.530.936.111.2
DEIMv2ViT-Tiny74.327.827.430.321.7
RT-DETRResNet-5073.928.928.332.118.5
DiffusionDetResNet-5067.024.524.326.79.6

CO-DETR achieves the highest overall AP50:95AP_{50:95}, confirming that dense transformer-based assignment mechanisms provide the most reliable localization for extremely small and sparse targets. DINO performs competitively. However, DiffusionDet underperforms, suggesting its coarse denoising schedule struggles with tiny objects embedded in cluttered backgrounds.

Limitations of Real-Time Models (YOLO)

MethodAP50AP_{50}AP50:95AP_{50:95}AP50:95smallAP_{50:95}^{small}AP50:95mediumAP_{50:95}^{medium}AP50:95largeAP_{50:95}^{large}
YOLOv850.117.516.622.96.0
YOLOv950.817.116.023.54.0
YOLOv1049.717.416.323.25.1
YOLOv1152.118.317.224.65.7
YOLOv1251.617.716.923.315.1

While YOLO models achieve sub-millisecond inference latency, they lag significantly behind transformer architectures in AP50:95AP_{50:95}, particularly missing the mark on medium objects (AP50:95mediumAP_{50:95}^{medium}). This reinforces that lightweight detection heads and lower input resolutions limit fine-grained localization on very small targets.

Conclusion

Our benchmark reveals a clear trade-off: while accuracy-oriented transformer detectors provide the strongest localization performance, their computational cost limits real-time deployment on low-power platforms. Conversely, YOLO models provide extremely fast inference but struggle to capture the fine spatial details required for consistent small-object detection.

RoLID-11K establishes a challenging benchmark for extreme small-object detection in dynamic driving scenes, aiming to support the development of scalable, low-cost systems for roadside-litter monitoring.

📄 Looking for complete experimental data and technical details? Click here to download the full WACVW 2026 Paper PDF