Zero-Inflated Exponential Family Embeddings
نویسندگان
چکیده
Word embeddings are a widely-used tool to analyze language, and exponential family embeddings (Rudolph et al., 2016) generalize the technique to other types of data. One challenge to fitting embedding methods is sparse data, such as a document/term matrix that contains many zeros. To address this issue, practitioners typically downweight or subsample the zeros, thus focusing learning on the non-zero entries. In this paper, we develop zero-inflated embeddings, a new embedding method that is designed to learn from sparse observations. In a zero-inflated embedding (ZIE), a zero in the data can come from an interaction to other data (i.e., an embedding) or from a separate process by which many observations are equal to zero (i.e. a probability mass at zero). Fitting a ZIE naturally downweights the zeros and dampens their influence on the model. Across many types of data— language, movie ratings, shopping histories, and bird watching logs—we found that zero-inflated embeddings provide improved predictive performance over standard approaches and find better vector representation of items.
منابع مشابه
Introducing COZIGAM: An R Package for Constrained Zero-Inflated Generalized Additive Model Analysis
Zero-inflation problem is very common in ecological studies as well as other areas. Nonparametric regression with zero-inflated data may be studied via the zero-inflated generalized additive model (ZIGAM), which assumes that the zero-inflated responses come from a probabilistic mixture of zero and a regular component whose distribution belongs to the 1-parameter exponential family. With the fur...
متن کاملConstrained Generalized Additive Model with Zero-Inflated Data
Zero inflation problem is very common in ecological studies as well as other areas. We propose the COnstrained Zero-Inflated Generalized Additive Model (COZIGAM) for analyzing zero-inflated data. Our approach assumes that the response follows some distribution from the zero-inflated 1-parameter exponential family, with the further assumption that the probability of zero inflation is some monoto...
متن کاملConstrained Generalized Additive Models for Zero-Inflated Data
Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the Zero-Inflated Generalized Additive Model (ZIGAM). ZIGAM assumes that the response variable follows a probabilistic mixture distribution of a zero atom and a regular component whose distribution belongs to some 1-param...
متن کاملModeling the Number of Attacks in Multiple Sclerosis Patients Using Zero-Inflated Negative Binomial Model
Background and aims: Multiple sclerosis (MS) is an inflammatory disease of the central nervous system.The impact of the number of attacks on the disease is undeniable. The aim of this study was to analyze thenumber of attacks in these patients.Methods: In this descriptive-analytical study, the registered data of 1840 MS patients referred to the MS clinicof Ayatollah Kash...
متن کاملDynamic Embeddings for Language Evolution
Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. [35] developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collect...
متن کامل