Abstract:
Rice appearance quality inspection based on machine vision has been widely applied. However, in practical scenarios, grain adhesion often leads to unstable and inaccurate segmentation results. Existing methods tend to suffer from over-segmentation or blurred boundaries when dealing with naturally scattered and adhesive rice grain images. To address this issue, this paper proposes an improved rice grain segmentation model, GT-YOLOv8, based on YOLOv8. A novel backbone network, GTNet, is designed with a larger spatial receptive field and dynamic convolution capability to enhance edge feature extraction and overall segmentation efficiency. Additionally, a multi-scale attention mechanism is integrated, and the WIoU loss function is introduced to further improve feature extraction and global information modeling. To validate the effectiveness of the proposed model, a rice instance segmentation dataset named STR-900 was constructed, containing multiple rice varieties, various lighting conditions, and different adhesion levels. Experimental results show that the proposed GTNet backbone achieves performance comparable to the ConvNeXt backbone while maintaining a significantly lower number of parameters, demonstrating a favorable balance between lightweight design and efficiency. Notably, the GT-YOLOv8 model built upon GTNet exhibits outstanding performance in the task of instance segmentation for adhesive rice grains, achieving a 5.93 percentage point improvement in Mask mAP over the original YOLOv8, with significantly enhanced segmentation accuracy and edge clarity for adhered grains. In summary, the GT-YOLOv8 model demonstrates high segmentation accuracy and strong generalization capability under complex adhesion conditions, indicating promising application potential.