Scaling similarity joins over tree-structured data
نویسندگان
چکیده
منابع مشابه
Scaling Similarity Joins over Tree-Structured Data
Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-ofthe-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join res...
متن کاملEfficient Similarity Search for Tree-Structured Data
Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to t...
متن کاملMonadic Queries over Tree-Structured Data
Monadic query languages over trees currently receive considerable interest in the database community, as the problem of selecting nodes from a tree is the most basic and widespread database query problem in the context of XML. Partly a survey of recent work done by the authors and their group on logical query languages for this problem and their expressiveness, this paper provides a number of n...
متن کاملPrefix Tree Indexing for Similarity Search and Similarity Joins on Genomic Data
Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences. Especially DNA sequencing produces large collections of erroneous strings which need to be searched, compared, and merged. However, current RDBMS offer similarity operations only in a very limited and inefficient for...
متن کاملA Similarity Measure on Tree Structured Business Data
In many business situations, products or user profile data are so complex that they need to be described by use of tree structures. Evaluating the similarity between tree-structured data is essential in many applications, such as recommender systems. To evaluate the similarity between two trees, concept corresponding nodes should be identified by constructing an edit distance mapping between th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2015
ISSN: 2150-8097
DOI: 10.14778/2809974.2809976