REPOSITORY > RESULTS

Doctoral dissertation

An Evaluation Method for Feature Rankings

Author(s): Ivica Slavkov (Author), Sašo Džeroski (Supervisor)

Thesis defense date: 26.07.2012

Organization: MPŠ - Mednarodna podiplomska šola Jožefa Stefana

PID: 20.500.12556/ReVIS-13604

Views: 9 | Downloads: 9

Abstract

Feature ranking is the machine learning task of inducing an ordering of features in a
given dataset according to some notion of relevance. We consider the feature ranking task
in the context of supervised learning, where the notion of feature relevance is defined with
respect to a target concept. Feature ranking is rarely considered as a standalone task, since
it usually precedes some other machine learning task, such as learning predictive models.
Because of that, the evaluation of feature ranking algorithms, in a real-world setting, is
always performed in a utility oriented manner.
The main focus of the work presented in this thesis is the evaluation of feature rankings.
More specifically, three goals related to the evaluation of feature rankings are addressed. The
first is to formally define and quantify a ground truth feature ranking. As a result we obtain
a synthetic controlled setting that can be used to evaluate feature rankings. The second
goal is to propose an evaluation method for feature rankings. This method should be able
to estimate and compare the quality of feature rankings produced by different approaches.
The third goal is to identify practical scenarios where feature ranking evaluation is needed
and to demonstrate the usefulness of the proposed method.
We first formally define the feature ranking task. Based on definitions given in the
literature, in the context of feature selection, the purpose of feature ranking is to solve the
all-relevant feature selection problem and provide a correct ordering of the relevant features.
The output of the feature ranking task is an ordered set of features, called a feature ranking.
Feature ranking is based on the definition of relevance. We thus provide a survey of
the various definitions and measures of feature relevance that exist in the literature. A
common deficiency of most of these feature relevance measures is that they treat each feature
independently. To overcome this, we propose to quantify feature relevance by a measure
based of feature interactions, grounded in information theory. By using this definition of
relevance, we are able to produce a ground truth ranking and define a synthetic controlled
setting for evaluating feature rankings.
Next, we propose an evaluation method for feature rankings. Intuitively, the method
is based on the utility of feature ranking as a filter method for feature selection. It is a
formalised algorithmic procedure based on stepwise construction of k-ranked feature subsets
and subsequent construction of predictive models. The output of the evaluation method
are the so-called error curves that show how the relevant features are distributed across a
feature ranking. These curves can be further used to comparatively evaluate the quality of
different feature ranking methods.
The proposed ranking evaluation method is tested in a controlled setting, where the
definition of ground truth ranking is used. The experiments are based on adding different
levels of noise to the known ground truth ranking and evaluating them with the proposed
method. Both uniform and non-uniform noise addition to the ground truth ranking is
considered. The results show that the evaluation method can successfully detect the decrease
in quality of the feature rankings as higher and higher levels of noise are considered.
We then perform three empirical studies, covering different domains. The first investigates
the behaviour of different feature ranking algorithms on both synthetic and various
real datasets. The results reveal that ReliefF is the best performing method on the synthetic
data. On real data, there is no statistically significant difference in quality of the rankings
produced by ReliefF and the information gain.
The second study investigates the usefulness of feature ranking ensembles (FREs). The
aim of the analysis is to find under what conditions FREs produce rankings with better
quality, as compared to the rankings produced by individual ranking methods. The results
show that constructing FREs has a profound effect only when considering unstable base
ranking methods, such as the ones provided by Random Forests.
The last study is from the domain of cancer research, more specifically the area of
embryonal tumours (ETs). The aim of the analysis is to identify key genes involved in
tumour aggressiveness. Our method helped in this task by identifying the feature ranking
method that produces the best ranked list of candidate genes. This was also confirmed by
a subsequent construction of gene networks for the top-ranked genes.
Finally, we conclude the thesis by discussing several possibilities for further work in
this area. The first and the simplest extension would be in the experimental evaluation
performed. Namely, in our work we considered only classification and regression target concepts.
We would like to extend this to also include experiments performed on structured
targets. The second line of further development would be to combine several aspects of
feature rankings, including stability, when evaluating feature ranking methods. The final
direction for further work is to evaluate feature rankings by considering different utilities, besides
their predictive performance. An example for this was in the last case study conducted,
when the feature rankings were used to construct gene networks.

Attachments

Cite this work