LIPN: Introducing a new Geographical Context Similarity Measure and a Statistical Similarity Measure based on the Bhattacharyya coefficient
نویسندگان
چکیده
This paper describes the system used by the LIPN team in the task 10, Multilingual Semantic Textual Similarity, at SemEval 2014, in both the English and Spanish sub-tasks. The system uses a support vector regression model, combining different text similarity measures as features. With respect to our 2013 participation, we included a new feature to take into account the geographical context and a new semantic distance based on the Bhattacharyya distance calculated on cooccurrence distributions derived from the Spanish Google Books n-grams dataset.
منابع مشابه
A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملA new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators
Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...
متن کاملRobust statistical registration of 3D ultrasound images using texture information
We investigate a new registration method for ultrasound volumes relying on on a statistical texture-based similarity measure. Texture information is given by spatial Gabor filters and represented by statistical kernel-based distributions. The registration similarity measure is then defined as a probabilistic distance, derived from Bhattacharyya coefficient, between two statistical distributions...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملA new similarity measure between type-2 fuzzy numbers and fuzzy risk analysis
In this paper, we present a revised similarity measure based onChen-and-Chen's similarity measure for fuzzy risk analysis. The revisedsimilarity measure uses the corrected formulae to calculate the centre ofgravity points, therefore it is more effective than the Chen-and-Chen'smethod. The revised similarity measure can overcome the drawbacks of theexisting methods. We have also proposed a new ...
متن کامل