This PhD dissertation focuses on improving terminology extraction and alignment for applications in the translation industry. It explores three key use cases where these techniques benefit language professionals: creating client-specific terminology lists from large parallel corpora (i.e. translation memories), building domain-specific terminology resources from comparable corpora, and identifying important domain-specific …
Automatic terminology extraction, also known as automatic term extraction (ATE), is a natural language processing (NLP) task that identifies specialized terminology from domain-specific corpora. ATE is often used for terminographic tasks (e.g., the creation of specialized dictionaries) and contributes to several complex downstream tasks (e.g., machine translation and information retrieval). …
The thesis addresses a novel representation learning framework, combining neural and symbolic text representations, and demonstrates its utility for tackling diverse natural language processing problems. The proposed approach, avoiding the deficiencies of purely symbolic and purely neural methods, can be applied for the generation of efficient text representations. Its usefulness …
In language technologies, syntactic parsing represents one of the possible intermediate steps of text analysis in the applications such as machine translation, information extraction, question answering, etc. Syntactic trees are often used to demonstrate the structure of text. In the last decades, the dependency framework became a popular syntactic representation, …