A note on the triangle inequality for the Jaccard distance

نویسنده

  • Sven Kosub
چکیده

Two simple proofs of the triangle inequality for the Jaccard distance in terms of nonnegative, monotone, submodular functions are given and discussed. The Jaccard index [8] is a classical similarity measure on sets with a lot of practical applications in information retrieval, data mining, machine learning, and many more (cf., e.g., [7]). Measuring the relative size of the overlap of two finite sets A and B, the Jaccard index J and the associated Jaccard distance Jδ are formally defined as: J(A,B) =def |A ∩B| |A ∪B| , Jδ(A,B) =def 1− J(A,B) = 1− |A ∩B| |A ∪B| = |A△B| |A ∪B| where J(∅, ∅) =def 1. The Jaccard distance Jδ is known to fulfill all properties of a metric, most notably, the triangle inequality—a fact that has been observed many times, e.g., via metric transforms [12, 13, 4], embeddings in vector spaces (e.g., [15, 11, 4]), minwise independent permutations [1], or sometimes cumbersome arithmetics [10, 3]. A very simple, elementary proof of the triangle inequality was given in [5] using an appropriate partitioning of sets. Here, we give two more simple, direct proofs of the triangle inequality. One proof comes without any set difference or disjointness of sets. It is based only on the fundamental equation |A ∪ B| + |A ∩ B| = |A| + |B|. As such, the proof is generic and leads to (sub)modular versions of the Jaccard distance (as defined below). The second proof unfolds a subtle difference between the two possible versions. Though the original motivation was to give a proof of the triangle inequality as simple as possible, the link with submodular functions is interesting in itself (as also recently suggested in [6]). Let X be a finite, non-empty ground set. A set function f : P(X) → R is said to be submodular on X if f(A∪B)+f(A∩B) ≤ f(A)+f(B) for all A,B ⊆ X. If all inequalities are equations then f is called modular on X. It is known that f is submodular on X if and only if the following condition holds (cf., e.g., [14]): f(A ∪ {x}) − f(A) ≥ f(B ∪ {x}) − f(B) for all A ⊆ B ⊆ X, x ∈ B (1) A set function f is monotone if f(A) ≤ f(B) for all A ⊆ B ⊆ X; f is nonnegative if f(A) ≥ 0 for all A ⊆ X. Each nonnegative, monotone, modular function f on X can be

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the metric triangle inequality

A non-contradictible axiomatic theory is constructed under the local reversibility of the metric triangle inequality. The obtained notion includes the metric spaces as particular cases and the generated metric topology is T$_{1}$-separated and generally, non-Hausdorff.

متن کامل

The Triangle Inequality and Its Applications in the Relative Metric Space

Let C be a plane convex body. For arbitrary points , denote by , n a b E  ab the Euclidean length of the line-segment . Let be a longest chord of C parallel to the line-segment . The relative distance between the points and is the ratio of the Euclidean distance between and b to the half of the Euclidean distance between and . In this note we prove the triangle inequality in with the relative ...

متن کامل

Further results on the subspace distance

In previous papers [1, 2], we proposed a subspace distance. However, whether the subspace distance satisfies the triangle inequality was left open. In this note, we give positive answer to the open problem and prove our assertion.

متن کامل

A note on quickly finding the nearest neighbour

and D dimensional vectors, it takes O (D) operations to compute this distance. For a set of N vectors, computing the nearest neighbour to q would take then O (DN) operations. For large datasets this can be prohibitively expensive. Is there a way to avoid calculating all the distances? This is a large research area (see [2] for a review) and we will focus here on first methods that make use of t...

متن کامل

Properties of distance spaces with power triangle inequalities

Metric spaces provide a framework for analysis and have several very useful properties. Many of these properties follow in part from the triangle inequality. However, there are several applications in which the triangle inequality does not hold but in which we may still like to perform analysis. This paper investigates what happens if the triangle inequality is removed all together, leaving wha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.02696  شماره 

صفحات  -

تاریخ انتشار 2016