Repository > Results

Doctoral dissertation

Exploiting domain knowledge in predictive learning from food and nutrition data

Author(s): Gordana Ispirova (Author), Barbara Koroušić Seljak (Supervisor), Tome Eftimov (Co-Supervisor)

Thesis defense date: 14.12.2022

Organization: MPŠ - Mednarodna podiplomska šola Jožefa Stefana

PID: 20.500.12556/ReVIS-13852

Download main file (4.6 MB)

Views: 4 | Downloads: 8

Abstract

Human knowledge about food and nutrition has evolved drastically with time. With food
and nutrition-related data being mass produced and easily accessible, the next step is to
use Artificial Intelligence (AI) to translate data into knowledge. The majority of AI research
is model-driven, and classical Machine Learning (ML) pipelines concentrate on the
model-centric approach, prioritizing training the best model for a specific task, with the
main focus on improving model parameters, overlooking the importance of data.
We propose a novel ML pipeline that fused data and domain-driven knowledge for a predictive
task from the Food and Nutrition domain – fast prediction of nutrient values from
unstructured recipe text. Our proposed pipeline consists of three parts: representation
learning (RL), unsupervised ML, and supervised ML. In the RL part, word and paragraph
embeddings are learned for text short descriptions of foods (recipe titles), in the unsupervised
ML part the recipes are separated in clusters based on a domain-specific coding
(FoodEx2 classification) from external domain resource, and in the supervised ML part,
the two parts are combined – separate predictive models are trained for each cluster for separate
nutrients using the learned embeddings as input features. The pipeline is evaluated
with a criteria defined using domain knowledge (nutrient tolerance levels) and compared
to baselines also calculated using the same criteria.
As the evaluation results showed that including the domain knowledge in the unsupervised
ML part improved the results compared to the baseline, we propose an alteration of the
ML pipeline. We include two different external sources of domain knowledge for clustering
in the unsupervised ML part, to explore the domain bias for the same prediction task.
To further improve the ML pipeline, we include domain knowledge in the RL part of the
pipeline. Instead of obtaining recipe title embeddings, we introduce a domain heuristic
for merging embeddings of the ingredients of the recipe. This proved to be a successful
way to train excellent performing predictive models for predicting nutrient values, as the
accuracies obtained were significantly higher than the baseline.
As the domain-specific embeddings showed to be high performant, through the process
of data normalization using dictionary and rule-based Named Entity Recognition and
data mapping to a Food Composition Database from six heterogeneous multilingual recipe
datasets, we composed two predefined corpora of embeddings – ingredient and recipe embeddings.
Training embeddings tailored for a specific task is a very time-consuming process,
therefore these corpora of predefined embeddings can be used for research purposes as well
as transferred to other tasks for application purposes.
To explore the major impact data has on model-performance, we focused on generalization
of predictive models, by defining a generalizability index that indicates the trust of
transferring a predictive model learned on one dataset to another. Going a step further to
show the importance of data in predictive modeling, we show different ways of selecting a
representative training dataset, and the results show how different selections of the training
dataset produce different outcomes. The training data should be representative of the data
expected in deployment, covering all variations that deployment data will present.

Metadata

Work Type	Doctoral dissertation
Language	English
Organization	MPŠ - Mednarodna podiplomska šola Jožefa Stefana
PID	20.500.12556/ReVIS-13852
COBISS ID	134449155
UDK	004.85:004.6:613.2(043.3)
Thesis defense date	14.12.2022

Attachments

Attachment - academic_work_attachments/Gordana_Ispir… (4.6 MB) MD5: 821c181a0d7c6f0a4a08082cb5e33e96

Cite this work

Citation style:

Back to Search View in ReVIS View in COBISS

REPOSITORY > RESULTS

Exploiting domain knowledge in predictive learning from food and nutrition data

Abstract

Metadata

Attachments

Cite this work