Unsupervised word sense disambiguation in dynamic semantic spaces
نویسنده
چکیده
In this paper, we are mainly concerned with the ability to quickly and automa cally dis nguish word senses in dynamic seman c spaces in which new terms and new senses appear frequently. Such spaces are built “on the fly” from constantly evolving data sets such as Wikipedia, repositories of patent grants and applica ons, or large sets of legal documents for Technology Assisted Review and e-discovery. This immediacy rules out supervision as well as the use of a priori training sets. We show that the various senses of a term can be automa cally made apparent with a simple clustering algorithm, each sense being a vector in the seman c space. While we only consider here seman c spaces build by using random vectors, this algorithm should work with any kind of embedding, provided meaningful similari es between terms can be computed and do fulfill at least the two basic condi ons that terms which close meanings have high similari es and terms with unrelated meanings have near-zero similari es.
منابع مشابه
Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing
Word Sense Disambiguation is the task dedicated to the problem of finding out the sense of a word in context, from all of its many possible senses. Solving this problem requires to know the set of possible senses for a given word, which can be acquired from human knowledge, or from automatic discovery, called Word Sense Induction. In this article, we adapt two existing meta-methods of Word Sens...
متن کاملDistributional Semantics Approach to Thai Word Sense Disambiguation
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...
متن کاملKim, Su Nam and Timothy Baldwin (to appear) Word Sense Disambiguation and Noun Compounds, ACM Transactions on Speech and Language Processing
In this paper, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and lexical substitution to build supervised and unsupervised word sense disambiguation (WSD) classifiers, and show our unsupervised learner to be supe...
متن کاملUtilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation
Recent advances in word sense induction rely on clustering related words. In this paper, instead of using a clustering algorithm, we suggest to perform a Singular Value Decomposition (SVD) which can be guaranteed to always find a global optimum. However, in order to apply this method to the problem of word sense induction, a semantic interpretation of the dimensions computed by the SVD is requi...
متن کاملUMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness
In this paper we describe an unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarsegrained English Lexical Sample task. The system disambiguates a target word by using WordNet-based measures of semantic relatedness to find the sense of the word that is semantically most strongly related to the senses of the words in the context of t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.02605 شماره
صفحات -
تاریخ انتشار 2018