REPOSITORY > RESULTS

Doctoral dissertation

Food and drink image detection and recognition using deep convolutional neural networks

Author(s): Simon Mezgec (Author), Barbara Koroušić Seljak (Supervisor)

Thesis defense date: 03.12.2021

Organization: MPŠ - Mednarodna podiplomska šola Jožefa Stefana

PID: 20.500.12556/ReVIS-13908

Views: 7 | Downloads: 10

Abstract

A healthy diet is becoming increasingly relevant as recognizing dietary deficiencies often
leads to actionable results that can improve the individual’s overall health. However, to
identify areas of potential improvement, tracking food intake is necessary. Manual methods
have traditionally been used to perform this tracking, but these methods have a number of
downsides, such as inaccuracy and a high level of effort and motivation needed to manually
track intake. This is why novel solutions are required. Such solutions can efficiently
automate food tracking, thus facilitating dietary assessment. Due to the pervasiveness
of smartphones with built-in cameras, automating dietary assessment by recognizing food
and drink items from images that may not be of the best quality seems like a promising
approach to develop solutions that could reach a large portion of the population. There
have been multiple approaches presented for this problem, with deep learning—or more
specifically, deep neural networks—achieving the state of the art in the field.
This doctoral dissertation presents three solutions for food and drink image detection,
recognition, and segmentation using deep convolutional neural networks, which are a type
of deep neural networks mainly used for image processing. The first solution includes a
detection model to remove nonfood images from a self-acquired dataset, and an image
recognition model based on a novel deep neural network architecture, called NutriNet.
With it, a classification accuracy of 86.72% was achieved. The second solution is based on
fake food (food replicas), which is used in experimental research in behavioral nutrition.
Using an existing deep neural network architecture, an image segmentation model was
trained on a fake-food image dataset and it achieved an accuracy of 92.18%. The third
solution is based on the second one and it was submitted to a worldwide competition for
food image recognition, the Food Recognition Challenge. In the scope of this challenge,
an image segmentation model was trained and it achieved a precision of 59.2% on the
challenging competition dataset of real-world food and drink images, which ranked second
in the second round of the competition.
These solutions and results contributed to the development of the food image recognition
field in recent years and they further validate the usage of deep convolutional neural
networks for this problem, as well as present a novel architecture and approach to input
data collection in the deep learning field. To the best of the author’s knowledge, they also
achieved multiple firsts: the NutriNet solution was the first to recognize images of drinks,
while the fake-food solution was the first to automatically recognize food replicas and also
the first to include a single deep neural network architecture for the joint segmentation
and classification of food images.

Attachments

Cite this work