Learning to rank with combinatorial Hodge theory
نویسندگان
چکیده
We propose a number of techniques for learning a global ranking from data that may be incomplete and imbalanced — characteristics that are almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our rank learning method exploits the graph Helmholtzian, which is the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We shall study the graph Helmholtzian using combinatorial Hodge theory, which provides a way to unravel ranking information from edge flows. In particular, we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the l2-optimal global ranking and a cyclic (divergence-free) flow that measures the inconsistency of the global ranking obtained — if this large, then it indicates that the data does not have a good global ranking. This cyclic flow can be further decomposed orthogonally into a triangular cyclic flow (curl) and a ‘harmonic’ flow that is globally cyclic but locally acyclic; these provides information on whether inconsistency in the ranking data arises locally or globally. When applied to the problem of rank learning, Hodge decomposition sheds light on whether a given dataset may be globally ranked in a meaningful way or if the data is inherently inconsistent and thus could not have any reasonable global ranking; in the latter case it provides information on the nature of the inconsistency. An obvious advantage over the NP-hardness of Kemeny optimization (which is primarily for ordinal ranking data) is that the discrete Hodge decomposition may be easily computed via a linear least squares regression. We also investigated the l1-projection of edge flows, showing that this has a dual given by correlation maximization over bounded divergence-free flows, and the l1-approximate sparse cyclic ranking, showing that this has a dual given by correlation maximization over bounded curl-free flows. We discuss connections with well-known ordinal ranking techniques such as Kemeny optimization and Borda count from social choice theory.
منابع مشابه
Hodge Theory for Combinatorial Geometries
The matroid is called loopless if the empty subset of E is closed, and is called a combinatorial geometry if in addition all single element subsets of E are closed. A closed subset of E is called a flat of M, and every subset of E has a well-defined rank and corank in the poset of all flats of M. The notion of matroid played a fundamental role in graph theory, coding theory, combinatorial optim...
متن کاملHodge Polynomials of the Moduli Spaces of Pairs
Let X be a smooth projective curve of genus g ≥ 2 over the complex numbers. A holomorphic pair on X is a couple (E,φ), where E is a holomorphic bundle over X of rank n and degree d, and φ ∈ H(E) is a holomorphic section. In this paper, we determine the Hodge polynomials of the moduli spaces of rank 2 pairs, using the theory of mixed Hodge structures. We also deal with the case in which E has fi...
متن کاملCombinatorics of binomial decompositions of the simplest Hodge integrals
We reduce the calculation of the simplest Hodge integrals to some sums over decorated trees. Since Hodge integrals are already calculated, this gives a proof of a rather interesting combinatorial theorem and a new representation of Bernoulli numbers.
متن کاملHodge Spectrum of Hyperplane Arrangements
In this article there are two main results. The first result gives a formula, in terms of a log resolution, for the graded pieces of the Hodge filtration on the cohomology of a unitary local system of rank one on the complement of an arbitrary divisor in a smooth projective complex variety. The second result is an application of the first. We give a combinatorial formula for the spectrum of a h...
متن کاملThe Approximate Rank of a Matrix and its Algorithmic Applications
We study the -rank of a real matrix A, defined for any > 0 as the minimum rank over matrices that approximate every entry of A to within an additive . This parameter is connected to other notions of approximate rank and is motivated by problems from various topics including communication complexity, combinatorial optimization, game theory, computational geometry and learning theory. Here we giv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0811.1067 شماره
صفحات -
تاریخ انتشار 2008