REPOSITORY > RESULTS

Doctoral dissertation

Searching for Credible Relations in Machine Learning

Author(s): Vedrana Vidulin (Author), Matjaž Gams (Supervisor), Bogdan Filipič (Co-Supervisor)

Thesis defense date: 03.02.2012

Organization: MPŠ - Mednarodna podiplomska šola Jožefa Stefana

PID: 20.500.12556/ReVIS-13585

Views: 7 | Downloads: 10

Abstract

Can a model constructed by machine learning or data mining programs be trusted? For
example, it is known that a decision tree model can contain less-credible parts caused by
pathologies in induction algorithms, noise and missing values in data, or simply because
of the complexity of a domain. Such models typically contain relations that are statistically
significant, but in reality meaningless. Meaningless relations are problematic since
they undermine the user's trust in the data mining system and can also lead to wrong
conclusions about the most important relations in the domain.
In this thesis we propose an interactive method for the construction of credible relations
in complex domains, named Human-Machine Data Mining (HMDM). The basic idea
of our approach is to construct a large number of models to extract the credible relations,
i.e., relations that are meaningful and of high quality. The task is computationally very
demanding, and for other than simple cases there is no possibility for humans to analyze
a meaningful share of all the hypothesized models on their own. However, the introduced
combination of human understanding and raw computer power enables a smart examination
of the parts of the huge search space with most credible models. While data mining
methods perform the search, humans examine and evaluate the results, make conclusions
and redo the search in a way that seems to be the most promising based on the previous
attempts. In this way, the humans guide the data mining to search the subspaces with
the most credible models and finally the humans construct the overall conclusions from
the various, most interesting solutions.
The HMDM defines a toolbox composed of semi-automated data mining procedures
and a set of scenarios for the human to guide the analysis towards credible models. Furthermore,
it defines a scheme for the extraction of credible relations from multiple models,
which provides support to the human analyst in the process of constructing correct conclusions
about the domain.
The proposed approach is demonstrated in two complex domains that show how the
higher education and the research and development sectors are related to economic welfare.
In addition, we showed in a domain of automatic web genre identification that HMDM
can be successfully used for learning predictive models in another domain.
A user study justified the HMDM method by showing that the users are frequently
not able to detect meaningless relations by observing a single model constructed by a
machine learning algorithm. However, by observing interesting variations, i.e., candidate
solutions suggested by the HMDM method, the participants realized the weaknesses of
the default model and created better domain models.

Attachments

Cite this work