Automatic Construction of Persian ICT WordNet using Princeton WordNet
Authors
Abstract:
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose several automatic methods to extract Information and Communication Technology (ICT)-related data from Princeton WordNet. We, then, add these extracted data to our Persian WordNet. The advantage of automated methods is reducing the interference of human factors and accelerating the development of our bilingual ICT WordNet. In our first proposed method, based on a small subset of ICT words, we use the definition of each synset to decide whether that synset is ICT. The second mechanism is to extract synsets which are in a semantic relation with ICT synsets. We also use two similarity criteria, namely LCS and S3M, to measure the similarity between a synset definition in WordNet and definition of any word in Microsoft dictionary. Our last method is to verify the coordinate of ICT synsets. Results show that our proposed mechanisms are able to extract ICT data from Princeton WordNet at a good level of accuracy.
similar resources
Automatic Persian WordNet Construction
In this paper, an automatic method for Persian WordNet construction based on Prenceton WordNet 2.1 (PWN) is introduced. The proposed approach uses Persian and English corpora as well as a bilingual dictionary in order to make a mapping between PWN synsets and Persian words. Our method calculates a score for each candidate synset of a given Persian word and for each of its translation, it select...
full textPersian Wordnet Construction using Supervised Learning
This paper presents an automated supervised method for Persian wordnet construction. Using a Persian corpus and a bi-lingual dictionary, the initial links between Persian words and Princeton WordNet synsets have been generated. These links will be discriminated later as correct or incorrect by employing seven features in a trained classification system. The whole method is just a classification...
full textUnsupervised Learning for Persian WordNet Construction
In this paper we introduce an unsupervised learning approach for WordNet construction. The whole construction method is an Expectation Maximization (EM) approach which uses Princeton WordNet 3.0 (PWN) and a corpus as the data source for unsupervised learning. The proposed method can be used to construct WordNet in any language. Links between PWN synsets and target language words are extracted u...
full textEnhancing Automatic Wordnet Construction Using Word Embeddings
Researchers have shown that a wordnet for a new language, possibly resource-poor, can be constructed automatically by translating wordnets of resource-rich languages. The quality of these constructed wordnets is affected by the quality of the resources used such as dictionaries and translation methods in the construction process. Recent work shows that vector representation of words (word embed...
full textAutomatic Construction of Japanese WordNet
Although WordNets have been developed for a number of languages, no attempts to construct a Japanese WordNet have been known to exist. Taking this into account, we launched a project to automatically translate the Princeton WordNet into Japanese by a method of unsupervised word-sense disambiguation using bilingual comparable corpora. The method we propose aligns English word associations with t...
full textA Strategy of Mapping Polish WordNet onto Princeton WordNet
We present a strategy and the early results of the mapping of plWordNet – one of the largest such language resources in existence – onto Princeton WordNet. The fundamental structural premise of plWordNet differs from those of most other wordnets: lexical units rather than synsets are the basic building blocks. The addition of new material to plWordNet is consistently informed by semantic relati...
full textMy Resources
Journal title
volume 7 issue 1
pages 109- 119
publication date 2019-03-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023