### Course Description

# Data Mining and Knowledge Discovery

## Program

Information and Communication Technologies, third-level study programme## Lecturers:

prof. dr. Nada Lavračdoc. dr. Martin Žnidaršič

prof. dr. Bojan Cestnik

dr. Petra Kralj Novak

## Goals:

Knowledge discovery in databases is a process of discovering patterns and models, described by rules or other human understandable representation formalisms. The most important step in this process is data mining, performed by using methods, techniques and tools for automated constructions of pattrns and models from data.

The course objectives are to (a) introduce the basics of data mining, the process of knowledge discovery in databases and the CRISP-DM methodology, (b) present selected data mining methods and techniques, (c) present the methodology for result evaluation.

The students will master the basics of data preprocessing, data mining and knowledge discovery and will be capable of using selected data mining tools and results evaluation methods in practice.

## Content:

Introduction:

introduction to data mining and knowledge discovery in databases, relation with machine learning, visualization of data and models, presentation of the CRISP-DM knowledge discovery methodology

Data preparation and preprocessing:

tabular data and relational databases,

handling of missing and noisy values,

attrinute/feature subset selection

Data mining techniques:

presentation of specific data mining techniques: decision, regression and model tree learning, learning classification and association rules, clustering, nearest neighbors approach, Naive Bayesian classifier, support vector machines, artificial neural networks, subgroup discovery, ansamble classifiers

Heuristics and results evaluation:

presentation of search heuristics, heuristics for estimating the quality of induced patterns and models, methodology for results evaluation

Advanced data mining methods:

Semi-supervised learning, active learning, relational data mining, propositionalization, semantic data mining

Practical training:

practical use of selected data mining techniques and tools

## Course literature:

Selected chapters from the following books:

• J.H. Witten, E. Frank, and M.A. Hall, Data Mining: Practical Machine Learning Tools and Techniques (Third Edition). Morgan Kaufmann, 2011. ISBN 978-0-12-374856-0

• T. Mitchell, Machine Learning. McGraw Hill, 1997. ISBN 0070428077

• M. Berthold, and D.J. Hand, Eds. Intelligent Data Analysis: An Introduction. Springer, 2003. ISBN 978-3-540-43060-5

• S. Džeroski, and N. Lavrač, Eds. Relational Data Mining. Springer, 2001. ISBN 3-540-42289-7

• J. Fürnkranz, D. Gamberger, and N. Lavrač, Foundations of Rule Learning. Springer, 2012. ISBN 978-3-540-75196-0

• M. Bramer, Principles of Data Mining. Springer, 2007. ISBN 978-1-84628-765-7

## Significant publications and references:

• J. Fürnkranz, D. Gamberger, and N. Lavrač, Foundations of Rule Learning. Springer, 2012.

• A. Vavpetič, V. Podpečan, and N. Lavrač, Semantic subgroup explanations. J. Intell. Inf. Syst.

42(2): 233-254, 2014.

• Petrič, B. Cestnik, N. Lavrač, and T. Urbančič, Outlier detection in cross-context link discovery for creative literature mining. The Computer Journal 55(1): 47-61, 2012.

• B. Sluban, D. Gamberger, and N. Lavrač, Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min. Knowl. Discov. 28(2): 265-303, 2014.

• M. Grčar, N. Trdin, and N. Lavrač. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3): 321-335, 2013.

## Examination:

Written or oral exam (40%)

Seminar work with oral defense (60%)

## Students obligations:

Written or oral exam

Seminar work with oral defense