Ingredient-guided Multimodal Self-distillation for Food Image Segmentation
CSTR:
Author:
Affiliation:

(1.Shandong Normal University, Jinan 250358;2.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Objectives: With advancements in computer vision technology, accurately identifying and segmenting various components in food images has become essential for food nutrition analysis and promoting healthier diet management. However, most existing image segmentation models rely solely on a single image input, which often struggles to capture subtle distinguishing features in food images with minimal visual differences, ultimately impacting segmentation accuracy. This paper addressed the limitations of single-modality approaches in segmentation tasks by incorporating text information to provide richer contextual data for the model. Additionally, it leveraged self-distillation techniques to guide the model in effectively segmenting food images. Methods: This paper proposed a multi-modal self-distillation segmentation model guided by ingredient information to improve food image segmentation. The model leveraged the comparative languaged pre-training model (CLIP) to capture ingredient information and fused it with image knowledge. By combining the strengths of the diffusion model in dense prediction, the model achieved accurate segmentation of food images. Results: When evaluated on the benchmark dataset FoodSeg103, the model achieved an mIoU of 47.93%, surpassing the current best-performing FoodSAM model by 1.51%. On the UEC-FoodPIX Complete benchmark dataset, the mIoU reached 75.13%, outperforming the FoodSAM model by 8.99%. Conclusions: The proposed multi-modal self-distillation network demonstrated strong performance in food image segmentation, showcasing the effective role of ingredient information in guiding segmentation tasks. This approach significantly improves segmentation accuracy and presents a promising solution for food image analysis.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 29,2024
  • Revised:
  • Adopted:
  • Online: December 25,2024
  • Published:
Article QR Code
Copyright :Journal of Chinese Institute of Food Science and Technology     京ICP备09084417号-4
Address :9/F, No. 8 North 3rd Street, Fucheng Road, Haidian District, Beijing, China      Postal code :100048
Telephone :010-65223596 65265376      E-mail :chinaspxb@vip.163.com
Supported by : Beijing E-Tiller Technology Development Co., Ltd.