MPŠ MP&Scaron MP&Scaron MP&Scaron Avtorji

Jožef Stefan
Postgraduate School

Jamova 39
SI-1000 Ljubljana

Phone: +386 1 477 31 00
Fax: +386 1 477 31 10


Course Description

Text/Multimedia Mining and Semantic Technologies


Information and Communication Technologies, third-level study programme


prof. dr. Dunja Mladenić


The goal of this course is to provide knowledge neccessary for research and development in text and multimedia mining as well as in semantic technologies.
The competencies of the students completing this course successfully would include understanding of basic concepts, methods and techniques for text mining and cross modal data analysis as well as semantic technologies including usage of the relevant tools needed for research and development in the area.


Basics of working with text and multimedia data:
finding regularities in data
data processing
statistical artifacts vs. evidence

Representation of data:
lexical, syntactic, semantic

Example tasks in complex data analytics:
extracting information from text
user modeling
communication analysis
multi-lingual and cross-lingual data

unsupervised learning

Handling data size:
atypical operators
storing big data

Course literature:

Selected chapters from the following sources:

• C. Sammut, and G. I. Webb, Eds. Encyclopedia of Machine Learning, Springer 2010. ISBN 0387307680 (selected entries)
• D. Mladenić, N. Lavrač, M. Bohanec, and S. Moyle, Eds. Data Mining and Decision Support: Integration and Collaboration. Kluwer 2003. ISBN 1402073887 (selected chapters)
• J.Davies, M. Grobelnik, and D. Mladenić, Eds. Semantic Web: Integrating Ontology Management, Knowledge Discovery and Human Language Technologies, Springer, 2008. ISBN 3642100287 (selected chapters)
• P. Warren, J. Davies, and E. Simperl, Eds. Context and Semantics for Knowledge Management: Technologies for Personal Productivity. Heidelberg: Springer, cop. 2011. ISBN 3642195091 (selected chapters)

Significant publications and references:

• D. Mladenić, and M. Grobelnik, Machine learning on text. In: GOLUB, Koraljka, Ed. Subject access to information : an interdisciplinary approach. Santa Barbara; Denver; Oxford: Libraries Unlimited, 2015, pp. 132-148.
• D. Mladenić, and M. Grobelnik, Automatic text analysis by artificial intelligence. Informatica, ISSN 0350-5596, 2013, 37:1, pp. 27-33.
• D. Mladenić, J. Brank, and M. Grobelnik, Document classification. In: Sammut, C. and Webb, G.I. (eds.). Encyclopedia of Machine Learning. New York: Springer, 2011, pp. 289-293.
• D. Mladenić, Text mining. In: Sammut, C. and Webb, G.I. (eds.). Encyclopedia of Machine Learning. New York: Springer, 2011, pp. 962-963.
• D. Mladenić, Feature selection in text mining. In: Sammut, C. and Webb, G.I. (eds.). Encyclopedia of Machine Learning. New York: Springer, 2011, pp. 406-410.
• L. Bradeško, and D. Mladenić, A survey of chabot system through a Loebner prize competition. In: Erjavec, T. and Žganec Gros, J. (eds.) Proceedings of the Eighth Language Technologies Conference : proceedings of the 15th International Multiconference Information Society - IS 2012, volume C, Ljubljana: Institut Jožef Stefan, 2012, pp. 34-37.


Oral exam (50%)
Seminar work with oral defense (50%)

Students obligations:

Oral exam
Seminar work with oral defense