Timely detection of aphids is critical for effective pest control in agriculture to prevent the spread of various crops diseases. In response to this, our research investigates unsupervised learning techniques, particularly clustering, to assess their potential in alleviating the annotation workload for insect classification. We leverage Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) as underlying structures (backbones) for the clustering algorithms, namely K-means and Agglomerative. We compared various CNN and ViT architectures (ResNet-18, ResNet-101, ResNet-152, ViT-B/16, and ViT-L/32) using three approaches: fully supervised classification, fine-tuned clustering, and unsupervised clustering. Feature extraction from different layers (Global Average Pooling for CNNs, self-attention for ViTs) was utilized for clustering. In the context on the research, fine-tuning means that the model was trained on similar insect images, and the new learned features are then used for clustering. Our findings demonstrate that while state-of-the-art Vision Transformers excel in supervised classification (ViT-L/32 achieving 98.69% F1 score), they marginally underperform in clustering compared to ResNet-101, which achieved an 87.26% F1 score in this aspect. Notably, fine-tuning the models significantly improved clustering performance, with ViT-B/16 achieving an F1 score of 98.27%. Unsupervised classification, although less accurate, offers potential as an initial step in image classification, reducing the burden of expert annotations. These results underscore the potential of automated classification methods in pest control, highlighting the trade-offs between supervised and unsupervised techniques. While ViTs demonstrate superiority in supervised scenarios, the effectiveness of fine-tuning for clustering tasks suggests avenues for improving ViT performance in unsupervised settings. This research lays the groundwork for leveraging machine learning to streamline pest detection and control in agriculture, potentially reducing reliance on manual annotation efforts.
Link to the paper of this project.