Explicit Neural Word Representation
نویسندگان
چکیده
Recent advances in word embedding provide significant benefit to various information processing tasks. Yet these dense representations and their estimation of word-to-word relatedness remain difficult to interpret and hard to analyze. As an alternative, explicit word representations propose vectors whose dimensions are easily interpretable, and recent methods show competitive performance to the dense vectors. We introduce a neural-based explicit representation, rooted in the conceptual ideas of the word2vec Skip-Gram model. The method provides interpretable explicit vectors while keeping the effectiveness of the Skip-Gram model. The evaluation of various explicit representations on word association collections shows that the newly proposed method outperforms the state-of-the-art explicit representations when tasked with ranking highly similar terms. As a case study on the use of our explicit representation, we show the degree of the existence of gender bias in the English language (used in Wikipedia) in regards to various occupations. By measuring the bias towards explicit Female and Male factors, the study quantifies a general tendency of the majority of the occupations to male and a strong bias in a few specific occupations (e.g. nurse) to female. ACM Reference format: Navid Rekabsaz, Mihai Lupu, Allan Hanbury and Bhaskar Mitra. 2017. Explicit Neural Word Representation. In Proceedings of ACM International Conference on Information and Knowledge Management, Singapore, 2017
منابع مشابه
Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective
Recently significant advances have been witnessed in the area of distributed word representations based on neural networks, which are also known as word embeddings. Among the new word embedding models, skip-gram negative sampling (SGNS) in the word2vec toolbox has attracted much attention due to its simplicity and effectiveness. However, the principles of SGNS remain not well understood, except...
متن کاملText Embedding with Advanced Recurrent Neural Model
Embedding method has become a popular way to handle unstructured data, such as word and text. Word embedding, providing computational-friendly representations for word similarity, is almost be one of the standard solutions for various text mining tasks. Lots of recent studies focusing on word embedding try to generate a more comprehensive representation for each word that incorporating task-spe...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملToward Incorporation of Relevant Documents in word2vec
Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms’ co-occurrences in short-window contexts. An alternative (and well-...
متن کاملLarge Vocabulary Recognition of On - LineHandwritten Cursive
| This paper presents a writer independent system for large vocabulary recognition of on-line handwritten cursive words. The system rst uses a ltering module, based on simple letter features, to quickly reduce a large reference dictionary (lexicon) to a more manageable size; the reduced lexicon is subsequently fed to a recognition module. The recognition module uses a temporal representation of...
متن کامل