The rapid advancements in Machine Learning (ML) and Black-Box Optimization (BBO) have led to an increased reliance on benchmarking data for evaluating and comparing algorithms across diverse domain tasks. However, the effective exploitation of this data is hindered by challenges such as syntactic variability, semantic ambiguity, and lack of standardization. …
In this thesis, we introduce novel methods for equation discovery (ED), based on the use of probabilistic grammars. ED and symbolic regression address the task of finding a symbolic mathematical model that best describes observed data. Models can be as simple as an algebraic equation or as complex as a …
Contaminants of emerging concern (CECs), representing a subgroup of organic compounds of natural or synthetic origin, and their degradation and transformation products (TPs), with potentially harmful effects on humans, biota, and the environment, are the eco-exposome (EE) constituents of utmost importance. Their identification, quantification, and continued investigation into their environmental …
In this thesis, we integrate complex nodes into predictive clustering trees (PCTs). PCTs are well-established machine learning models that are very flexible in terms of the machine learning tasks that they can address, including structured output prediction and semisupervised learning. Like standard decision trees, they are learned with a greedy …
Most machine learning, data mining and statistical methods rely on the assumption that the analyzed data are independent and identically distributed (i.i.d.). More specifically, the individual examples included in the training data are assumed to be drawn independently from each other from the same probability distribution. However, cases where this …
In the thesis, we address the task of polynomial regression, i.e., inducing regression models based on polynomial equations, from data. We aim at improving and extending the existing approaches to learning polynomial regression models in several directions. First, we improve the existing methods for addressing the issue of over-fitting and …
In this thesis we address the problem of learning various types of decision trees from timechanging data streams. In particular, we study online machine learning algorithms for learning regression trees, linear model trees, option trees for regression, multi-target model trees, and ensembles of model trees from data streams. These are …
The domain of data mining (DM) deals with analyzing different types of data. The data typically used in data mining is in the format of a single table, with primitive datatypes as attributes. However, structured (complex) data, such as graphs, sequences, networks, text, image, multimedia and relational data, are receiving …
Feature ranking is the machine learning task of inducing an ordering of features in a given dataset according to some notion of relevance. We consider the feature ranking task in the context of supervised learning, where the notion of feature relevance is defined with respect to a target concept. Feature …
The task of mathematical modeling of dynamic systems from observed system behavior, widely known under the name of system identification, breaks down into two subtasks. The first task, referred to as structure identification, is to specify the model structure, i.e., the functional form of the model. In practice, the model …