Streaming Embeddings with Slack
نویسندگان
چکیده
We study the problem of computing low-distortion embeddings in the streaming model. We present streaming algorithms that, given an n-point metric space M , compute an embedding of M into an n-point metric space M ′ that preserves a (1−σ)-fraction of the distances with small distortion (σ is called the slack). Our algorithms use space polylogarithmic in n and the spread of the metric. Within such space limitations, it is impossible to store the embedding explicitly. We bypass this obstacle by computing a compact representation of M ′, without storing the actual bijection from M into M ′.
منابع مشابه
Spanners with Slack
Given a metric (V, d), a spanner is a sparse graph whose shortest-path metric approximates the distance d to within a small multiplicative distortion. In this paper, we study the problem of spanners with slack : e.g., can we find sparse spanners where we are allowed to incur an arbitrarily large distortion on a small constant fraction of the distances, but are then required to incur only a cons...
متن کاملESTEEM: A Novel Framework for Qualitatively Evaluating and Visualizing Spatiotemporal Embeddings in Social Media
Analyzing and visualizing large amounts of social media communications and contrasting short-term conversation changes over time and geolocations is extremely important for commercial and government applications. Earlier approaches for largescale text stream summarization used dynamic topic models and trending words. Instead, we rely on text embeddings – low-dimensional word representations in ...
متن کاملEmbedding, Distance Estimation and Object Location in Networks
Concurrent with numerous theoretical results on metric embeddings, a growing body of research in the networking community has studied the distance matrix defined by node-to-node latencies in the Internet, resulting in a number of recent approaches that approximately embed this distance matrix into low-dimensional Euclidean space. A fundamental distinction between the theoretical approaches to e...
متن کاملDistributed Non-Parametric Representations for Vital Filtering: UW at TREC KBA 2014
Identifying documents that contain timely and vital information for an entity of interest, a task known as vital filtering, has become increasingly important with the availability of large document collections. To efficiently filter such large text corpora in a streaming manner, we need to compactly represent previously observed entity contexts, and quickly estimate whether a new document conta...
متن کاملOnline Learning of Interpretable Word Embeddings
Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Nonnegative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to mainta...
متن کامل