An Improved Algorithm for Tree Edit Distance Incorporating Structural Linearity
نویسندگان
چکیده
An ordered labeled tree is a tree in which the nodes are labeled and the left-to-right order among siblings is significant. The edit distance between two ordered labeled trees is the minimum cost of transforming one tree into the other by a sequence of edit operations. Among the best known tree edit distance algorithms, the majority can be categorized in terms of a framework named cover strategy. In this paper, we investigate how certain locally linear features may be utilized to improve the time complexity for computing the tree edit distance. We define structural linearity and present a method incorporating linearity which can work with existing cover-strategy based tree algorithms. We show that by this method the time complexity for an input of size n becomes O(n2 + φ(A, ñ)) where φ(A, ñ) is the time complexity of any cover-strategy algorithm A applied to an input size ñ, with ñ ≤ n, and the magnitude of ñ is reversely related to the degree of linearity. This result is an improvement of previous results when ñ < n and would be useful for situations in which ñ is in general substantially smaller than n, such as RNA secondary structure comparisons in computational biology.
منابع مشابه
An improved algorithm for tree edit distance with applications for RNA secondary structure comparison
An ordered labeled tree is a tree in which the nodes are labeled and the left-to-right order among siblings is relevant. The edit distance between two ordered labeled trees is the minimum cost of transforming one tree into the other through a sequence of edit operations. We present techniques for speeding up the tree edit distance computation which are applicable to a family of algorithms based...
متن کاملEfficient XML Structural Similarity Detection using Sub-tree Commonalities
Developing efficient techniques for comparing XML-based documents becomes essential in the database and information retrieval communities. Various algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered label...
متن کاملPartial Tree-Edit Distance: A Solution to the Default Class Problem in Pattern-Based Tree Classification
Pattern-based tree classifiers are capable of producing high quality results, however, they are prone to the problem of the default class overuse. In this paper, we propose a measure designed to address this issue, called partial tree-edit distance (PTED), which allows for assessing the degree of containment of one tree in another. Furthermore, we propose an algorithm which calculates the measu...
متن کاملA Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees
Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance pro...
متن کاملA Fine-Grained XML Structural Comparison Approach
As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007