Dimensionality Reduction using Similarity-induced Embeddings
نویسندگان
چکیده
The vast majority of dimensionality reduction (DR) techniques rely on the second-order statistics to define their optimization objective. Even though this provides adequate results in most cases, it comes with several shortcomings. The methods require carefully designed regularizers and they are usually prone to outliers. In this paper, a new DR framework that can directly model the target distribution using the notion of similarity instead of distance is introduced. The proposed framework, called similarity embedding framework (SEF), can overcome the aforementioned limitations and provides a conceptually simpler way to express optimization targets similar to existing DR techniques. Deriving a new DR technique using the SEF becomes simply a matter of choosing an appropriate target similarity matrix. A variety of classical tasks, such as performing supervised DR and providing out-of-sample extensions, as well as, new novel techniques, such as providing fast linear embeddings for complex techniques, are demonstrated in this paper using the proposed framework. Six data sets from a diverse range of domains are used to evaluate the proposed method and it is demonstrated that it can outperform many existing DR techniques.
منابع مشابه
Simple and Effective Dimensionality Reduction for Word Embeddings
Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Recently, there has been an emphasis on further improving the pre-trained word vectors through post-processing algorithms. One such area of improvement is the dimensionality reduction of word embeddings. Reducing the size of word embeddings through dimensionality reduct...
متن کاملWord Re-Embedding via Manifold Dimensionality Retention
Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words cooccurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality....
متن کاملScalable Ordinal Embedding to Model Text Similarity
Practitioners of Machine Learning and related fields commonly seek out embeddings of object collections into some Euclidean space. These embeddings are useful for dimensionality reduction, for data visualization, as concrete representations of abstract notions of similarity for similarity search, or as features for some downstream learning task such as web search or sentiment analysis. A wide a...
متن کاملOn Low Distortion Embeddings of Statistical Distance Measures into Low Dimensional Spaces
Statistical distance measures have found wide applicability in information retrieval tasks that typically involve high dimensional datasets. In order to reduce the storage space and ensure efficient performance of queries, dimensionality reduction while preserving the inter-point similarity is highly desirable. In this paper, we investigate various statistical distance measures from the point o...
متن کاملLearning similarity preserving representations with neural similarity encoders
Many dimensionality reduction or manifold learning algorithms optimize for retaining the pairwise similarities, distances, or local neighborhoods of data points. Spectral methods like Kernel PCA (kPCA) or isomap achieve this by computing the singular value decomposition (SVD) of some similarity matrix to obtain a low dimensional representation of the original data. However, this is computationa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE transactions on neural networks and learning systems
دوره شماره
صفحات -
تاریخ انتشار 2017