Views: 5 | Downloads: 8
In computer science, ontologies enable formalized knowledge representation. The goal of
ontology extension is to correctly augment the existing ontology with new formalized
knowledge (e.g., concepts, relationships etc.).
This thesis addresses the ontology extension process based on text mining methods.
News analysis is the application of the extended ontology. A novel OntoPlus
methodology introducing usage of the ontology content, structure and the co-occurrence
information is proposed for semi-automatic ontology extension. The OntoPlus
methodology allows transforming textual information into a structured conceptualized
form. The OntoPlus methodology is able to perform within different domains and
different information sources. The methodology enables extension of very large multidomain
ontologies.
The proposed OntoPlus methodology is evaluated using a well known Cyc ontology
and textual material from two domains – financial domain and fisheries & aquaculture
domain. We have found that the best results are achieved by combining content, structure
and co-occurrence information, where the combination of weights depends on the
domain. In our case, the ontology content and structure are more important than cooccurrence
for data in financial domain. At the same time, the ontology content and the
co-occurrence have higher importance for data in fisheries & aquaculture domain.
The thesis also addresses the process of business news analysis by (1) the ontology
extension with relevant concepts, (2) ontology population with entities, facts and events
extracted from text and (3) reasoning based on the obtained ontology. We introduce a
pipeline for business news analysis, which utilizes entity, event and fact extraction
service, the OntoPlus methodology and the Cyc ontology. Furthermore, we populate the
Cyc ontology with a set of entities, events and facts extracted from a collection of
financial news. We use ontology structural and lexical features for obtaining matches
between existing ontology instances and new instances extracted from the Web. The
pipeline for business news analysis constitutes a whole strategy of business news
analysis and question answering based on the ontology reasoning and information from
the news.
The experimental results demonstrate that using the proposed OntoPlus methodology,
based on the combination of the ontology content, structure and the co-occurrence
information and using the proposed pipeline for business news analysis provide a
potential to aggregate new knowledge into the existing ontology. The user obtains a
support in analysis of financial texts and business information.