基于成分引导的多模态自蒸馏食品图像分割
作者:
作者单位:

(1.山东师范大学 济南 250358;2.中国科学院计算技术研究所 北京 100190)

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2023YFF1105104);国家自然科学基金面上项目(62072289,62372278);北京市自然科学基金项目(JQ24021)


Ingredient-guided Multimodal Self-distillation for Food Image Segmentation
Author:
Affiliation:

(1.Shandong Normal University, Jinan 250358;2.Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    目的:随着计算机视觉技术的发展,精确地识别并分割食品图像中的不同成分区域,对于食品营养分析和促进饮食健康管理至关重要。然而,当前图像分割模型多依赖于单一图像输入,这一做法在处理视觉差异较小的食品图像时,往往难以捕捉到细微的区分特征,从而影响分割精度。本文旨在解决单一模态在分割任务中的不足,利用文本信息为模型提供更加丰富的上、下文信息,采用自蒸馏技术,引导模型对食品图像的有效分割。方法:提出一种基于成分信息引导的多模态自蒸馏分割模型。该模型采用对比语言文本预训练模型(CLIP)捕捉成分信息,再与图像知识有效融合,结合扩散模型在稠密预测方面的优势,实现对食品图像的精准分割。结果:在基准数据集FoodSeg103上验证,所提模型的评估指标mIoU达到47.93%,超越了当前最优的FoodSAM模型1.51个百分点。在基准数据集UEC-FoodPIX Complete上,模型的评估指标mIoU达到75.13%,比FoodSAM模型高8.99个百分点。结论:所提出的多模态自蒸馏网络在食品图像分割任务中表现出色,验证了成分信息对分割任务的有效指导作用,提升了分割精度,为食品图像分析提供了新的解决方案。

    Abstract:

    Objectives: With advancements in computer vision technology, accurately identifying and segmenting various components in food images has become essential for food nutrition analysis and promoting healthier diet management. However, most existing image segmentation models rely solely on a single image input, which often struggles to capture subtle distinguishing features in food images with minimal visual differences, ultimately impacting segmentation accuracy. This paper addressed the limitations of single-modality approaches in segmentation tasks by incorporating text information to provide richer contextual data for the model. Additionally, it leveraged self-distillation techniques to guide the model in effectively segmenting food images. Methods: This paper proposed a multi-modal self-distillation segmentation model guided by ingredient information to improve food image segmentation. The model leveraged the comparative languaged pre-training model (CLIP) to capture ingredient information and fused it with image knowledge. By combining the strengths of the diffusion model in dense prediction, the model achieved accurate segmentation of food images. Results: When evaluated on the benchmark dataset FoodSeg103, the model achieved an mIoU of 47.93%, surpassing the current best-performing FoodSAM model by 1.51%. On the UEC-FoodPIX Complete benchmark dataset, the mIoU reached 75.13%, outperforming the FoodSAM model by 8.99%. Conclusions: The proposed multi-modal self-distillation network demonstrated strong performance in food image segmentation, showcasing the effective role of ingredient information in guiding segmentation tasks. This approach significantly improves segmentation accuracy and presents a promising solution for food image analysis.

    参考文献
    相似文献
    引证文献
引用本文

侯素娟,孙月娟,闵巍庆,王瑞平,蒋树强.基于成分引导的多模态自蒸馏食品图像分割[J].中国食品学报,2024,24(11):10-21

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-10-29
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-12-25
  • 出版日期:
文章二维码
版权所有 :《中国食品学报》杂志社     京ICP备09084417号-4
地址 :北京市海淀区阜成路北三街8号9层      邮政编码 :100048
电话 :010-65223596 65265375      电子邮箱 :chinaspxb@vip.163.com
技术支持:北京勤云科技发展有限公司

漂浮通知


×
《中国食品学报》杂志社招聘编辑