Linguistic Geometries for Unsupervised Dimensionality Reduction

نویسندگان

  • Yi Mao
  • Krishnakumar Balasubramanian
  • Guy Lebanon
چکیده

Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider the use of geometries specified manually by an expert, geometries derived automatically from corpus statistics, and geometries computed from linguistic resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction for Text using Domain Knowledge

Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider t...

متن کامل

Statement for Irina Matveeva

My research interest is to improve natural language applications by developing efficient unsupervised and semi-supervised machine learning approaches. My approach is to design machine learning solutions tailored to specific natural language problems based on an in-depth analysis of their components. I believe that machine learning algorithms are most efficient for language applications if they ...

متن کامل

Discriminative Unsupervised Dimensionality Reduction

As an important machine learning topic, dimensionality reduction has been widely studied and utilized in various kinds of areas. A multitude of dimensionality reduction methods have been developed, among which unsupervised dimensionality reduction is more desirable when obtaining label information requires onerous work. However, most previous unsupervised dimensionality reduction methods call f...

متن کامل

Articulatory Gesture Rich Representation Learning of Phonological Units in Low Resource Settings

Recent literature presents evidence that both linguistic (phonemic) and non linguistic (speaker identity, emotional content) information resides at a lower dimensional manifold embedded richly inside the higher-dimensional spectral features like MFCC and PLP. Linguistic or phonetic units of speech can be broken down to a legal inventory of articulatory gestures shared across several phonemes ba...

متن کامل

Unsupervised dimensionality reduction: the challenges of big data visualisation

Dimensionality reduction is an unsupervised task that allows high-dimensional data to be processed or visualised in lower-dimensional spaces. This tutorial reviews the basic principles of dimensionality reduction and discusses some of the approaches that were published over the past years from the perspective of their application to big data. The tutorial ends with a short review of papers abou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1003.0628  شماره 

صفحات  -

تاریخ انتشار 2010