Views: 10 | Downloads: 14
The rapid advancements in Machine Learning (ML) and Black-Box Optimization (BBO)
have led to an increased reliance on benchmarking data for evaluating and comparing
algorithms across diverse domain tasks. However, the effective exploitation of this data
is hindered by challenges such as syntactic variability, semantic ambiguity, and lack of
standardization. In this dissertation, we address these challenges by advocating for formal
semantic representation of benchmarking data through the use of ontologies. By providing
standardized vocabularies and ontologies, we improve knowledge sharing and promote data
interoperability across studies in ML and BBO.
In the ML domain, focusing on multi-label classification (MLC), we design an ontologybased
framework for semantic annotation of benchmarking data, facilitating the creation
of MLCBench – a semantic catalog that enhances data accessibility and reusability. In the
BBO domain, we introduce the OPTION (OPTImization algorithm benchmarking ONtology)
ontology to formally represent benchmarking data, including performance data,
algorithm metadata, and problem landscapes. This ontology enables the automatic integration
and interoperability of knowledge and data from diverse benchmarking studies.
Building upon the semantically annotated benchmarking data, we conduct various
empirical studies, including tasks such as algorithm performance prediction and automated
algorithm selection (AAS). In the MLC domain, a data-driven AAS pipeline is proposed to
exploit this MLC benchmarking data. We evaluate the predictive power of dataset metafeatures
for AAS and explore various ML approaches – including regression, classification,
and pairwise methods – to identify the most effective one.
In the BBO domain, we exploit benchmarking data about modular BBO algorithms to
conduct a comprehensive analysis of how individual algorithm modules influence overall
performance. We develop algorithm representations derived from performance and feature
importance values, effectively linking algorithm behavior to problem landscape features.
Using these representations, we also relate module configurations and performance, providing
deeper insights into the impact of different modules on algorithm performance.
Furthermore, the semantically annotated benchmarking data on modular BBO optimization
algorithms is used as a backbone for creating various knowledge graphs (KGs).
The KGs are then examined for their predictive power in algorithm performance prediction.
By applying scoring-based KG embedding methods and graph neural networks, we
predict algorithm performance in transductive and inductive setups, respectively.
Overall, the contributions of this dissertation include the development of ontologybased
frameworks for managing benchmarking data in the ML and BBO domains, the
creation of semantic data catalogs, and novel methodologies for algorithm selection and
performance prediction. By addressing challenges in representation and exploitation, this
work advances both ML and BBO. It provides tools for improved data management and
algorithm selection, as well as insights into algorithm behavior.