### Course Description

# Methodology and Evaluation in Knowledge Technologies

## Program

Information and Communication Technologies, third-level study programme## Lecturers:

prof. dr. Marko Robnik Šikonja## Goals:

The goal of the course is to present a comprehensive overview of statistical learning methodology and evaluation of learned models on natural language processing use cases. The emphasis is to integrate theoretical knowledge of methodology and evaluation with practical skills from data analytics, i.e., the use of analytical tools for statistical learning, evaluation, model selection, visualization and interpretation of models.

The competencies of the students completing this course would include:

• understanding basic methodological approaches for statistical learning,

• knowledge of measures for success of learning and their properties,

• knowledge of generalization error estimation,

• practical use of statistical tests for comparison of learned models,

• knowledge of combining different models,

• competences in visualization of predictive models and their decisions,

• capability to apply open-source statistical learning tools to natural language processing tasks.

## Content:

Learning as modeling and optimization:

learning goals, learning tasks, classification, regression, probability estimation, density estimation, ranking, clustering, generalizations of basic tasks, learning as optimization

Validation of learning;

validation measures for different tasks, confidence intervals, bootstrap and permutation approaches, probability calibration

Error estimation:

data overfitting, regularization, bias-variance error, decomposition, margin, cross-validation, VC dimension, minimum description length principle

Model comparison:

no free lunch theorem, statistical test for model comparison

Combining models:

weak learning and principles of ensemble learning, error and diversity in ensembles

Visualization of predictive models:

additive models, explaining decisions, visualization techniques for some complex models, VIPER toolbox

Use cases from natural language processing.

## Course literature:

Selected chapters from the following books:

• G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R. Springer, 2013. ISBN 978-1-4614-7137-0

• T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning, 2nd edition. Springer, 2009. ISBN 978-0-387-84857-0

• P. K. Janert, Data analysis with open source tools. O'Reilly Media. 2010. ISBN 978-0-596-80235-6

• S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, O'Reilly Media. 2009. ISBN 978-0-596-51649-9

## Significant publications and references:

• M. Robnik Šikonja, and K. Vanhoof, Evaluation of ordinal attributes at value level. Data mining and knowledge discovery, 2007, vol. 14, no. 2, str. 225-243.

• M. Robnik Šikonja, and I. Kononeko, Theoretical and empirical analysis of ReliefF and RReliefF. Machine learning, 2003, 53:23-69.

• M. Robnik Šikonja, and I. Kononeko, Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(5):589-600.

• M. Pičulin, and M. Robnik Šikonja, Handling numeric attributes with ant colony based classifier for medical decision making. Expert systems with applications, 2014, 41(16):7524-7535.

• M. Robnik Šikonja, I. Kononeko, and E. Štrumbelj: Quality of Classiﬁcation Explanations with PRBF. Neurocomputing, 96:37-46, 2012

## Examination:

Written or oral exam (50%)

Seminar work with public presentation (50%)

## Students obligations:

Written or oral exam

Seminar work with public presentation