Numeric-attribute-powered Sentence Embedding

نویسندگان

  • Sicong Kuang
  • Brian D. Davison
چکیده

Modern embedding methods focus only on the words in the text. The word or sentence embeddings are trained to represent the semantic meaning of the raw texts. However, many quantified attributes associated with the text, such as numeric attributes associated with Yelp review text, are ignored in the vector representation learning process. Those quantified numeric attributes can provide important information to complement the text. For example, review stars, business stars and number of likes, etc., have great influence on interpreting the semantic meaning of text. Numeric attributes associated with the text often reveal the quantity or the significance of the object that the number is modifying. We propose an algorithm using vector projection to generate numeric-attribute-powered sentence embeddings for multi-label text classification. We evaluate our algorithm on a public Yelp dataset, showing that classification performance improves significantly when numeric attributes are incorporated well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy multilevel graph embedding

Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding br...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Character-aware Attention Residual Net- Work for Sentence Representation

Text classification in general is a well studied area. However, classifying short and noisy text remains challenging. Feature sparsity is a major issue. The quality of document representation here has a great impact on the classification accuracy. Existing methods represent text using bag-of-word model, with TFIDF or other weighting schemes. Recently word embedding and even document embedding a...

متن کامل

Integrated dimensionality reduction technique for mixed-type data involving categorical values

An extension to the recent dimensionality-reduction technique t-SNE is proposed. The extension facilitates t-SNE to handle mixed-type datasets. Each attribute of the data is associated with a distance hierarchy which allows the distance between numeric values and between categorical values be measured in a unified manner. More importantly, domain knowledge regarding semantic distance between ca...

متن کامل

Enhancing Sentence Relation Modeling with Auxiliary Character-level Embedding

Neural network based approaches for sentence relation modeling automatically generate hidden matching features from raw sentence pairs. However, the quality of matching feature representation may not be satisfied due to complex semantic relations such as entailment or contradiction. To address this challenge, we propose a new deep neural network architecture that jointly leverage pre-trained wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017