Is deep learning really necessary for word embeddings?

نویسندگان

  • Rémi Lebret
  • Joël Legrand
چکیده

Word embeddings resulting from neural language models have been shown to be successful for a large variety of NLP tasks. However, such architecture might be difficult to train and time-consuming. Instead, we propose to drastically simplify the word embeddings computation through a Hellinger PCA of the word co-occurence matrix. We compare those new word embeddings with some wellknown embeddings on NER and movie review tasks and show that we can reach similar or even better performance. Although deep learning is not really necessary for generating good word embeddings, we show that it can provide an easy way to adapt embeddings to specific tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Emdeddings through Hellinger PCA

Word embeddings resulting from neural language models have been shown to be a great asset for a large variety of NLP tasks. However, such architecture might be difficult and time-consuming to train. Instead, we propose to drastically simplify the word embeddings computation through a Hellinger PCA of the word cooccurence matrix. We compare those new word embeddings with some well-known embeddin...

متن کامل

Deep Learning Embeddings for Discontinuous Linguistic Units

Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although a number of recent papers have extended this to other linguistic units like morphemes and phrases. In this paper, we argue that learning embeddings for discontinuous linguistic units should also be considered. In an experimental evaluation on...

متن کامل

An Exploration of Embeddings for Generalized Phrases

Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although lots of recent papers have extended this to other linguistic units like morphemes and word sequences. In this paper, we define the concept of generalized phrase that includes conventional linguistic phrases as well as skip-bigrams. We compute...

متن کامل

Distributional Models and Deep Learning Embeddings: Combining the Best of Both Worlds

There are two main approaches to the distributed representation of words: lowdimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine these two approaches by learning embeddings based on distributionalmodel vectors – as opposed to one-hot vectors as is standardly done in deep learning. We sh...

متن کامل

Learning Word Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings – distributed representations of words – in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embeddi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013