Multi-field Categorical Data
نویسندگان
چکیده
This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data.
منابع مشابه
Categorical fracture orientation modeling: applied to an Iranian oil field
Fracture orientation is a prominent factor in determining the reservoir fluid flow direction in a formation because fractures are the major paths through which fluid flow occurs. Hence, a true modeling of orientation leads to a reliable prediction of fluid flow. Traditionally, various distributions are used for orientation modeling in fracture networks. Although they offer a fairly suitable est...
متن کاملDeep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction
Predicting user responses, such as click-through rate and conversion rate, are critical in many web applications including web search, personalised recommendation, and online advertising. Different from continuous raw features thatwe usually found in the image and audio domains, the input features in web space are always of multi-field and aremostly discrete and categorical while their dependen...
متن کاملImplementing SASL using Categorical Multi-combinators
Categorical multi-combinators form a rewriting system developed with the aim of providing efficient implementations of lazy functional languages. The core of the system of categorical multi-combinators consists of only two rewriting laws with a very low pattern-matching complexity. This system allows the equivalent of several -reductions to be performed at once, and avoids the generation of tri...
متن کاملOn Multi-dimensional Markov Chain Models
Markov chain models are commonly used to model categorical data sequences. In this paper, we propose a multi-dimensional Markov chain model for modeling high dimensional categorical data sequences. In particular, the models are practical when there are limited data available. We then test the model with some practical sales demand data. Numerical results indicate the proposed model when compare...
متن کاملRandom Ordinality Ensembles A Novel Ensemble Method for Multi-valued Categorical Data
Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble met...
متن کامل