REPOSITORY > LATEST

Latest Academic Works

Representing and exploiting benchmarking data for optimisation and learning

Author(s): Ana Kostovska (Author), Panče Panov (Supervisor), Sašo Džeroski (Co-Supervisor), Tome Eftimov (Co-Supervisor)

Year: 2025

Type: Doctoral dissertation

The rapid advancements in Machine Learning (ML) and Black-Box Optimization (BBO) have led to an increased reliance on benchmarking data for evaluating and comparing algorithms across diverse domain tasks. However, the effective exploitation of this data is hindered by challenges such as syntactic variability, semantic ambiguity, and lack of standardization. …

Probabilistic grammar-based equation discovery

Author(s): Jure Brence (Author), Sašo Džeroski (Supervisor), Ljupčo Todorovski (Co-Supervisor)

Year: 2024

Type: Doctoral dissertation

In this thesis, we introduce novel methods for equation discovery (ED), based on the use of probabilistic grammars. ED and symbolic regression address the task of finding a symbolic mathematical model that best describes observed data. Models can be as simple as an algebraic equation or as complex as a …

Annotation of semi-polar organic contaminants by using gas chromatography coupled to mass spectrometry and machine learning

Author(s): Milka Ljoncheva (Author), Tina Kosjek (Supervisor), Sašo Džeroski (Co-Supervisor)

Year: 2022

Type: Doctoral dissertation

Contaminants of emerging concern (CECs), representing a subgroup of organic compounds of natural or synthetic origin, and their degradation and transformation products (TPs), with potentially harmful effects on humans, biota, and the environment, are the eco-exposome (EE) constituents of utmost importance. Their identification, quantification, and continued investigation into their environmental …

Complex nodes in trees for structured output prediction

Author(s): Tomaž Stepišnik (Author), Dragi Kocev (Supervisor), Sašo Džeroski (Co-Supervisor)

Year: 2021

Type: Doctoral dissertation

In this thesis, we integrate complex nodes into predictive clustering trees (PCTs). PCTs are well-established machine learning models that are very flexible in terms of the machine learning tasks that they can address, including structured output prediction and semisupervised learning. Like standard decision trees, they are learned with a greedy …

Considering autocorrelation in predictive models

Author(s): Daniela Stojanova (Author), Sašo Džeroski (Supervisor)

Year: 2012

Type: Doctoral dissertation

Most machine learning, data mining and statistical methods rely on the assumption that the analyzed data are independent and identically distributed (i.i.d.). More specifically, the individual examples included in the training data are assumed to be drawn independently from each other from the same probability distribution. However, cases where this …

A Machine Learning Approach to Polynomial Regression

Author(s): Aleksandar Pečkov (Author), Sašo Džeroski (Supervisor), Ljupčo Todorovski (Co-Supervisor)

Year: 2012

Type: Doctoral dissertation

In the thesis, we address the task of polynomial regression, i.e., inducing regression models based on polynomial equations, from data. We aim at improving and extending the existing approaches to learning polynomial regression models in several directions. First, we improve the existing methods for addressing the issue of over-fitting and …

Algorithms for Learning Regression Trees and Ensembles on Evolving Data Streams

Author(s): Elena Ikonomovska (Author), Sašo Džeroski (Supervisor), João Gama (Co-Supervisor)

Year: 2012

Type: Doctoral dissertation

In this thesis we address the problem of learning various types of decision trees from timechanging data streams. In particular, we study online machine learning algorithms for learning regression trees, linear model trees, option trees for regression, multi-target model trees, and ensembles of model trees from data streams. These are …

A Modular Ontology of Data Mining

Author(s): Panče Panov (Author), Sašo Džeroski (Supervisor)

Year: 2012

Type: Doctoral dissertation

The domain of data mining (DM) deals with analyzing different types of data. The data typically used in data mining is in the format of a single table, with primitive datatypes as attributes. However, structured (complex) data, such as graphs, sequences, networks, text, image, multimedia and relational data, are receiving …

An Evaluation Method for Feature Rankings

Author(s): Ivica Slavkov (Author), Sašo Džeroski (Supervisor)

Year: 2012

Type: Doctoral dissertation

Feature ranking is the machine learning task of inducing an ordering of features in a given dataset according to some notion of relevance. We consider the feature ranking task in the context of supervised learning, where the notion of feature relevance is defined with respect to a target concept. Feature …

Parameter Identification in Nonlinear Dynamic Systems with Meta-heuristic Approaches

Author(s): Katerina Tashkova (Author), Sašo Džeroski (Supervisor), Jurij Šilc (Co-Supervisor)

Year: 2012

Type: Doctoral dissertation

The task of mathematical modeling of dynamic systems from observed system behavior, widely known under the name of system identification, breaks down into two subtasks. The first task, referred to as structure identification, is to specify the model structure, i.e., the functional form of the model. In practice, the model …