MPŠ
MPŠ MP&Scaron MP&Scaron MP&Scaron Avtorji

Jožef Stefan
International
Postgraduate School

Jamova 39
SI-1000 Ljubljana
Slovenia

Phone: +386 1 477 31 00
Fax: +386 1 477 31 10
Email: info@mps.si

Search

Course Description

Data and Text Mining

Program

Information and Communication Technologies, second-level study programme

Lecturers:

prof. dr. Nada Lavrač
doc. dr. Martin Žnidaršič
prof. dr. Bojan Cestnik
prof. dr. Dunja Mladenić
dr. Petra Kralj Novak

Goals:

Knowledge discovery in databases is a process of discovering patterns and models, described by rules or other human understandable representation formalisms. The most important step in this process is data mining, performed by using methods, techniques and tools for automated discovery of patterns and construction of models from data. The course objectives are to:

• introduce the basics of data mining, the process of knowledge discovery in databases, the CRISP-DM methodology and the basics of knowledge management
• present standard data formats, train students for the manipulation of tabular data, databases and data warehouses, as well as text, web and multimedia data
• present selected methods and techniques for mining of tabular data
• present selected methods and techniques for text, web and multimedia mining
• train students for practical use of selected data mining techniques and evaluation methods

Content:

Introduction:
introduction to knowledge data mining and knowledge discovery in databases, relation with machine learning,
visualization of data, patterns and models, presentation of the CRISP-DM knowledge discovery methodology, and the basics of knowledge management

Data representation and manipulation:
presentation of standard data formats, creation and manipulation of tabular data, databases and data warehouses, as well as handling of text, web and multimedia data

Techniques for mining of tabular data:
presentation of specific data mining techniques: presentation of search heuristics, decision tree learning, learning classification and association rules, clustering, subgroup discovery, regression tree learning, and relational data mining

Techniques for mining of text, web and multimedia data:
presentation of specific techniques for text, web and multimedia mining, and data visualization

Evaluation:
presentation of methods for estimating the quality of induced patterns and models, and methodology for result evaluation

Practical training:
practical use of selected data manipulation and data mining tools

Course literature:

Selected chapters from the following books:

• I. Witten, and F. Eibe, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999. ISBN 978-1-558-60552-7
• D. Mladenić, N. Lavrač, M. Bohanec, and S. Moyle, Eds. Data Mining and Decision Support: Integration and Collaboration. Kluwer, 2003. ISBN 1-4020-7388-7
• I. Kononenko, and M. Kukar, Machine Learning and Data Mining. Horwood Publishing, 2007. ISBN 978-1-904-27521-3
• T. Mitchell, Machine Learning. McGraw Hill, 1997. ISBN 978-0-070-42807-2
• M. Berthold, and D. J. Hand, Eds. Intelligent Data Analysis: An Introduction. Springer, Berlin-Heidelberg, 1999. ISBN 978-3-540-65808-5
• S. Džeroski, and N. Lavrač, Eds. Relational Data Mining. Springer 2001. ISBN 3-540-42289-7
• J. Fόrnkranz, D. Gamberger, and N. Lavrač, Foundations of Rule Learning. Springer 2012. ISBN 978-3-540-75196-0
• S. Chakrabarti, Mining the Web: Analysis of Hypertext and Semi Structured Data, Morgan Kaufmann, 2002. ISBN 1-55860-754-4
• U. Fayyad, G.G. Grinstein, and A. Wierse, Eds. Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann. 2001. ISBN 978-1-558-60689-0
• M. Bramer, Principles of Data Mining. Springer, 2007. ISBN 978-1-84628-765-7

Significant publications and references:

• J. Fόrnkranz, D. Gamberger, and N. Lavrač, Foundations of Rule Learning. Springer 2012.
• B. Sluban, D. Gamberger, and N. Lavrač, Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min. Knowl. Discov. 28(2): 265-303, 2014.
• A. Vavpetič, V. Podpečan, and N. Lavrač, Semantic subgroup explanations. J. Intell. Inf. Syst. 42(2): 233-254, 2014.
• M. Grčar, N. Trdin, and N. Lavrač. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3): 321-335, 2013.
• I. Petrič, B. Cestnik, N. Lavrač, and T. Urbančič, Outlier detection in cross-context link discovery for creative literature mining. The Computer Journal 55(1): 47-61, 2012.
• D. Mladenić, and M. Grobelnik, Machine learning on text. In: GOLUB, Koraljka (ed.). Subject access to information : an interdisciplinary approach. Santa Barbara; Denver; Oxford: Libraries Unlimited, 2015, pp. 132-148.
• D. Mladenić, and M. Grobelnik, Automatic text analysis by artificial intelligence. Informatica, ISSN 0350-5596, 2013, 37:1, pp. 27-33.
• D. Mladenić, Text mining. In: SAMMUT, Claude (ed.), WEBB, G.I. (ed.). Encyclopedia of Machine Learning. New York: Springer, 2011, pp. 962-963.
• D. Mladenić, Feature selection in text mining., In: SAMMUT, Claude (ed.), WEBB, G.I. (ed.). Encyclopedia of Machine Learning. New York: Springer, 2011, pp. 406-410.
• I. Petrič, and B. Cestnik, Predicting future discoveries from current scientific literature. In: KUMAR, Vinod D. (ur.). Biomedical Literature Mining, Methods in Molecular Biology, ISSN 1064-3745, vol. 1159). New York [etc.]: Humana Press, cop. 2014, pp. 159-168.

Examination:

Seminar and (written or oral) exam

Students obligations:

Seminar and (written or oral) exam

Links: