Feature construction, encompassing both feature engineering, which involves the manual design of features by domain experts, and representation learning, which refers to the automated discovery of useful data representations during model construction, is a fundamental aspect of machine learning. Its goal is to transform raw data into a more suitable …
This dissertation examines the ecological dynamics of phytoplankton communities in the northern Adriatic Sea, focusing on phenology, environmental drivers, and trophodynamics. The complexity of the region, characterized by the richness of phytoplankton communities, environmental variability and intensive human activities, is not seen as an obstacle but as an opportunity to …
The rapid advancements in Machine Learning (ML) and Black-Box Optimization (BBO) have led to an increased reliance on benchmarking data for evaluating and comparing algorithms across diverse domain tasks. However, the effective exploitation of this data is hindered by challenges such as syntactic variability, semantic ambiguity, and lack of standardization. …
Automatic terminology extraction, also known as automatic term extraction (ATE), is a natural language processing (NLP) task that identifies specialized terminology from domain-specific corpora. ATE is often used for terminographic tasks (e.g., the creation of specialized dictionaries) and contributes to several complex downstream tasks (e.g., machine translation and information retrieval). …
Contaminants of emerging concern (CECs), representing a subgroup of organic compounds of natural or synthetic origin, and their degradation and transformation products (TPs), with potentially harmful effects on humans, biota, and the environment, are the eco-exposome (EE) constituents of utmost importance. Their identification, quantification, and continued investigation into their environmental …
With the resurgence of neural network-based learning in the last decade, machine learning methods are becoming critical components of many real-life intelligent systems. However, while being able to learn effectively and at scale, such systems are often non-interpretable and unable to exploit existing symbolic background knowledge. The paradigm that offers …
Most machine learning, data mining and statistical methods rely on the assumption that the analyzed data are independent and identically distributed (i.i.d.). More specifically, the individual examples included in the training data are assumed to be drawn independently from each other from the same probability distribution. However, cases where this …
This thesis addresses the task of formalizing and implementing the process of semi-automatic ontology construction. We propose a theoretical framework for formalizing the ontology construction process. The process is described as a sequence of operators applied to the ontology. Several types of common operators are identified and each type is …
In this thesis, we address the task of learning models for predicting structured outputs, which take as input a tuple of attribute values and produce as output a structured object. In contrast to classification and regression, where the output is a single scalar value, in our case the output is …