Random Projections for Anchor-based Topic Inference
نویسنده
چکیده
Recent spectral topic discovery methods are extremely fast at processing large document corpora, but scale poorly with the size of the input vocabulary. Random projections are vital to ensure speed and limit memory usage. We empirically evaluate several methods for generating random projections and measure the effect of parameters such as sparsity and dimensionality. We find that methods with structured sparsity are faster than Gaussian random projections and more accurate than standard sparse random projections.
منابع مشابه
Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference
The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy algorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding an approximate convex hull in a high-dimensional space, we propose to find an exact convex hull i...
متن کاملEvaluating Regularized Anchor Words
We perform a comprehensive examination of the recently proposed anchor method for topic model inference using topic interpretability and held-out likelihood measures. After measuring the sensitivity to the anchor selection process, we incorporate L2 and Beta regularization into the optimization objective in the recovery step. Preliminary results show that L2 improves heldout likelihood, and Bet...
متن کاملIs Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models
Topic models provide insights into document collections, and their supervised extensions also capture associated document-level metadata such as sentiment. However, inferring such models from data is often slow and cannot scale to big data. We build upon the “anchor” method for learning topic models to capture the relationship between metadata and latent topics by extending the vector-space rep...
متن کاملA Hybrid Approach for Probabilistic Inference using Random Projections
We introduce a new meta-algorithm for probabilistic inference in graphical models based on random projections. The key idea is to use approximate inference algorithms for an (exponentially) large number of samples, obtained by randomly projecting the original statistical model using universal hash functions. In the case where the approximate inference algorithm is a variational approximation, t...
متن کاملA Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate...
متن کامل