Učni načrt predmeta

Predmet:
Analiza tekstovnih in večpredstavnih podatkov ter semantične tehnologije
Course:
Text/Multimedia Mining and Semantic Technologies
Študijski program in stopnja /
Study programme and level
Študijska smer /
Study field
Letnik /
Academic year
Semester /
Semester
Informacijske in komunikacijske tehnologije, 3. stopnja Tehnologije znanja 1 1
Information and Communication Technologies, 3rd level Knowledge Technologies 1 1
Vrsta predmeta / Course type
Izbirni / Elective
Univerzitetna koda predmeta / University course code:
IKT3-716
Predavanja
Lectures
Seminar
Seminar
Vaje
Tutorial
Klinične vaje
work
Druge oblike
študija
Samost. delo
Individ. work
ECTS
10 15 125 5

*Navedena porazdelitev ur velja, če je vpisanih vsaj 15 študentov. Drugače se obseg izvedbe kontaktnih ur sorazmerno zmanjša in prenese v samostojno delo. / This distribution of hours is valid if at least 15 students are enrolled. Otherwise the contact hours are linearly reduced and transfered to individual work.

Nosilec predmeta / Course leader:
prof. dr. Dunja Mladenić
Sodelavci / Lecturers:
Jeziki / Languages:
Predavanja / Lectures:
slovenščina, angleščina / Slovenian, English
Vaje / Tutorial:
Pogoji za vključitev v delo oz. za opravljanje študijskih obveznosti:
Prerequisites:

Praviloma morajo biti izpolnjeni pogoji za vpis na doktorski študij: zaključena druga stopnja bolonjskega študija ali diploma univerzitetnega študijskega programa. Potrebna so tudi osnovna znanja matematike, računalništva in informatike.

Student should typically fulfill the formal requirements for enrolling to the doctoral study program: completed Bologna second level study program or an equivalent pre-Bologna university study program. Basic knowledge of mathematics, computer science and informatics is also requested.

Vsebina:
Content (Syllabus outline):

Osnove dela stekstovnimi in drugimi večpredstavnimi podatki:
- iskanje zakonitosti v podatkih
- osnovno procesiranje podatkov
- statistična dejstva in evidenca

Predstavitev podatkov:
- leksikalna, sintaktična, semantična

Poglobljeni vpogled v izbrane scenarije zahtevnejše uporabe osnovnih metod:
- ekstrakcija znanja iz podatkov
- modeliranje uporabnikov
- analiza komunikacije v omrežjih
- večjezični in prekojezični podatki

Tehnike:
- nadzorovano učenje
- pol-nadzorovano učenje
- nenadzorovano učenje

Izzivi analize velikih količin podatkov:
- netipične operacije nad podatki
- shranjevanje velikih količin podatkov

Basics of working with text and multimedia data:
- finding regularities in data
- data processing
- statistical artifacts vs. evidence

Representation of data:
- lexical, syntactic, semantic

Example tasks in complex data analytics:
- extracting information from text
- user modeling
- communication analysis
- multi-lingual and cross-lingual data

Techniques:
- supervised
- semi-supervised
- unsupervised learning

Handling data size:
- atypical operators
- storing big data

Temeljna literatura in viri / Readings:

Izbrana poglavja iz naslednjih virov: / Selected chapters from the following sources:
- C. Sammut, and G. I. Webb, Eds. Encyclopedia of Machine Learning and Data Mining, Springer 2017.(selected entries)
- Charu C. Aggarwal, Machine Learning for Text, Springer, 2018.
- G. S. Ingersoll, T. S. Morton and A. L. Farris. Taming Text: How to Find, Organize, and Manipulate It, Manning Publications Co., 2013.
- C. C. Aggarwal. Data Mining: The Textbook, Springer, 2015 (selected chapters)

Cilji in kompetence:
Objectives and competences:

Cilj predmeta je usposobiti študente za raziskovalno in razvojno delo na področju analize tekstovnih in večpredstavnih podatkov ter semantičnih tehnologij.

Kompetence študenta z uspešno zaključenim predmetom bodo vključevale razumevanje metod in tehnike analize tekstovnih in večpredstavnih podatkov ter semantičnih tehnologij in sposobnost za uporabo ustreznih orodij. Študent bo pridobil tudi znanja, potrebna za samostojno raziskovalno delo in razvoj.

The goal of this course is to provide knowledge necessary for research and development in text and multimedia mining as well as in semantic technologies.

The competencies of the students completing this course successfully would include understanding of basic concepts, methods and techniques for text mining and cross modal data analysis as well as semantic technologies including usage of the relevant tools needed for research and development in the area.

Predvideni študijski rezultati:
Intendeded learning outcomes:

Študenti bodo pridobili znanje in razvojne veščine na naslednjih področjih:
- predstavitev tekstovnih podatkov
- osnovni prijemi pri predprocesiranju tekstovnih in večpredstavnih podatkov
- metode za analizo tekstovnih in večpredstavnih podatkov
- vizualizacija podatkov
- osnovni pojmi iz področja semantičnih tehnologij
- izbrani scenariji uporabe predstavljenih metod
- osnove analize velikih količin podatkov

The students will gain knowledge and skills in the following areas:
- data representation
- pre-processing text and multimedia
- text and multimedia mining methods
- data visualization
- basic concepts in semantic technologies
- ontologies and their usage in semantic technologies
- selected scenarios of using the presented methods
- basic issues in big data analytics

Metode poučevanja in učenja:
Learning and teaching methods:

Predavanja, seminar, konzultacije, individualno delo

Lectures, seminar, consultations, individual work

Načini ocenjevanja:
Delež v % / Weight in %
Assesment:
Ustni izpit
50 %
Oral exam
Seminarska naloga
25 %
Seminar work
Ustni zagovor
25 %
Oral defense
Reference nosilca / Lecturer's references:
1. SITTAR, Abdul, GROBELNIK, Marko, MLADENIĆ, Dunja. Profiling the barriers to the spreading of news using news headlines. Frontiers in artificial intelligence. 2023, vol. 6, str. 1-22, ilustr. ISSN 2624-8212. https://www.frontiersin.org/articles/10.3389/frai.2023.1225213/full, DOI: 10.3389/frai.2023.1225213.
2. SWATI, Swati, MLADENIĆ, Dunja, GROBELNIK, Marko. An inferential commonsense-driven framework for predicting political bias in news headlines. IEEE access. 2023, vol. 11, str. 1-17, ilustr. ISSN 2169-3536. https://ieeexplore.ieee.org/document/10193773/authors#authors, DOI: 10.1109/ACCESS.2023.3298877. [
3. ALBA, Ester, GAITÁN, Mar, LEÓN, Arabella, MLADENIĆ, Dunja, BRANK, Janez. Weaving words for textile museums : the development of the linked SILKNOW thesaurus. Heritage science. 2022, vol. 10, str. 59-1-59-14. ISSN 2050-7445. DOI: 10.1186/s40494-022-00681-x.
4. NOVAK, Erik, BIZJAK, Luka, MLADENIĆ, Dunja, GROBELNIK, Marko. Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval. Knowledge-based systems. [Print ed.]. 2022, vol. 244, art. 108545, 41 str. ISSN 0950-7051. DOI: 10.1016/j.knosys.2022.108545.
5. PITA COSTA, João, REI, Luis, STOPAR, Luka, GROBELNIK, Marko, MLADENIĆ, Dunja, NOVALIJA, Inna, et al. NewsMeSH : a new classifier designed to annotate health news with MeSH headings. Artificial intelligence in medicine. [Print ed.]. 2021, vol. 114, str. 102053-1-102053-11, graf. prikazi, tabele. ISSN 0933-3657. DOI: 10.1016/j.artmed.2021.102053.