Ingredient-guided Multimodal Self-distillation for Food Image Segmentation

doi:10.16429/j.1009-7848.2024.11.002

Home > Archive>Volume 24, Issue 11, 2024 >10-21. DOI:10.16429/j.1009-7848.2024.11.002

Ingredient-guided Multimodal Self-distillation for Food Image Segmentation
DOI:
                        10.16429/j.1009-7848.2024.11.002
                    
CSTR:
                        
Author:
                        
Affiliation:(1.Shandong Normal University， Jinan 250358;2.Institute of Computing Technology， Chinese Academy of Sciences， Beijing 100190)
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Objectives: With advancements in computer vision technology， accurately identifying and segmenting various components in food images has become essential for food nutrition analysis and promoting healthier diet management. However， most existing image segmentation models rely solely on a single image input， which often struggles to capture subtle distinguishing features in food images with minimal visual differences， ultimately impacting segmentation accuracy. This paper addressed the limitations of single-modality approaches in segmentation tasks by incorporating text information to provide richer contextual data for the model. Additionally， it leveraged self-distillation techniques to guide the model in effectively segmenting food images. Methods: This paper proposed a multi-modal self-distillation segmentation model guided by ingredient information to improve food image segmentation. The model leveraged the comparative languaged pre-training model (CLIP) to capture ingredient information and fused it with image knowledge. By combining the strengths of the diffusion model in dense prediction， the model achieved accurate segmentation of food images. Results: When evaluated on the benchmark dataset FoodSeg103， the model achieved an mIoU of 47.93%， surpassing the current best-performing FoodSAM model by 1.51%. On the UEC-FoodPIX Complete benchmark dataset， the mIoU reached 75.13%， outperforming the FoodSAM model by 8.99%. Conclusions: The proposed multi-modal self-distillation network demonstrated strong performance in food image segmentation， showcasing the effective role of ingredient information in guiding segmentation tasks. This approach significantly improves segmentation accuracy and presents a promising solution for food image analysis.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 29,2024
Revised:
Adopted:
Online: December 25,2024
Published:

Article QR Code

Copyright ：Journal of Chinese Institute of Food Science and Technology     京ICP备09084417号-4
Address ：9/F, No. 8 North 3rd Street, Fucheng Road, Haidian District, Beijing, China      Postal code ：100048
Telephone ：010-65223596 65265376      E-mail ：chinaspxb@vip.163.com
Supported by : Beijing E-Tiller Technology Development Co., Ltd.

Get Citation

Share

Article Metrics

History

Article QR Code