The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

The Garbage Dataset (GD) is a comprehensive benchmark containing 12,259 labeled images across ten household waste categories for developing automated waste segregation systems. Researchers demonstrated its utility by achieving 95.13% accuracy using EfficientNetV2S models while analyzing class imbalance and training carbon emissions. This dataset addresses critical challenges in environmental sustainability by providing standardized real-world data for recycling AI applications.

The Garbage Dataset (GD): A Multi-Class Image Benchmark for Automated Waste Segregation

Introducing the Garbage Dataset: A New Benchmark for AI-Powered Waste Segregation

Researchers have publicly released a novel, comprehensive image dataset designed to propel the development of automated waste segregation systems. The Garbage Dataset (GD) provides a critical, real-world benchmark for machine learning and computer vision models, addressing a key challenge in environmental sustainability. Comprising over 12,000 labeled images across ten common household waste categories, this resource aims to accelerate research into practical, deployable AI solutions for recycling and waste management.

Dataset Composition and Rigorous Curation

The GD encompasses a diverse range of ten waste material categories: metal, glass, biological, paper, battery, trash, cardboard, shoes, clothes, and plastic. Its 12,259 images were meticulously collected through multiple channels, including the dedicated DWaste mobile app and carefully curated web sources. To ensure data integrity and reliability, the curation process implemented rigorous validation protocols, including checksum verification and sophisticated outlier detection methods.

Further analytical steps were taken to characterize the dataset's inherent properties. Researchers performed an analysis of class imbalance and assessed visual separability using dimensionality reduction techniques like PCA and t-SNE. They also evaluated background complexity—a significant factor for real-world model performance—through quantitative measures of entropy and visual saliency.

Benchmarking Performance and Environmental Impact

The dataset's utility was demonstrated by benchmarking it against several state-of-the-art deep learning architectures. Models tested included EfficientNetV2M, EfficientNetV2S, MobileNet, ResNet50, and ResNet101. Performance was evaluated not only on standard metrics like accuracy and F1-score but also on the operational carbon emissions associated with training, introducing an important sustainability dimension to model selection.

The results revealed that EfficientNetV2S achieved the highest performance, attaining an accuracy of 95.13% and an F1-score of 0.95, while maintaining a moderate carbon cost. The analysis also surfaced key dataset characteristics that researchers must account for, including a notable class imbalance with a skew toward high-outlier classes like plastic, cardboard, and paper, as well as significant brightness variations across images.

Why This Matters for Sustainable AI

The release of the Garbage Dataset marks a significant step forward for applied AI in environmental technology. It provides a standardized, publicly available testbed that moves beyond controlled laboratory conditions, capturing the complexities of real-world waste streams. This enables more robust and generalizable model development.

  • Bridges a Critical Data Gap: High-quality, diverse, and well-labeled datasets are foundational for AI progress. GD fills a notable void in waste management research, offering a common ground for comparing different algorithmic approaches.
  • Highlights Practical Deployment Challenges: The study proactively identifies hurdles like class imbalance, background clutter, and brightness variance that directly impact the reliability of AI systems in practical settings like recycling facilities.
  • Introduces a Sustainability Metric: By benchmarking models on both performance and carbon emissions, the research frames an essential trade-off, pushing the field toward developing not just accurate but also environmentally efficient AI solutions.
  • Accelerates Circular Economy Goals: Improved automated sorting through advanced computer vision can significantly increase recycling rates and purity, reducing landfill waste and supporting global sustainability targets.

The main conclusion underscores that while the GD provides an invaluable resource for the research community, it also clearly outlines the challenges—such as managing class imbalance and evaluating environmental trade-offs—that must be solved to transition from promising prototypes to scalable, real-world deployment. The dataset's public release is poised to catalyze further innovation in this critical domain.

常见问题