Course Description

Language Technologies


Information and Communication Technologies, second-level study programme


prof. dr. Tomaž Erjavec


The goal of this course is to introduce language technologies, i.e. methods and applications of computer processing of natural language. The course gives the history and basic concepts of linguistics, various applications of language technologies and the computational methods which they use. Particular attention is given to language corpora, large datasets of annotated texts, which serve as the basic infrastructure necessary for research and processing of individual languages. Also discussed is the analysis of language corpora with machine learning methods. The focus of the course is on the processing of Slovene language.

Students will gain basic theoretical understanding and practical knowledge of language technologies and computational and corpus linguistics, which is a prerequisite for effective work on computer processing of language data.


Development of linguistics and computational linguistics, complexity of language, levels of linguistic analysis, overview of applications and methods.

Language corpora:
Purpose, history and typology, annotation, use cases, computer coding, examples.

Methods of computer processing:
Regular expressions and finite state automata, phrase-structure grammars, statistical methods, machine learning.

Corpus analysis with machine learning methods:
Relevant methods of machine learning, use cases: automatic morphological, syntactic and semantic annotation.

Encoding standards:
History of standardisation, coding of characters, XML, Text Encoding Initiative, MULTEXT, ISO, evaluation methods.

Information retrieval and extraction, machine translation, speech technologies, digital libraries, etc.

Seminar and oral exam (100%)

Students obligations:

Seminar and oral exam