An Unsupervised Approach to Product Attribute Extraction

نویسندگان

  • Santosh Raju
  • Prasad Pingali
  • Vasudeva Varma
چکیده

Product Attribute Extraction is the task of automatically discovering attributes of products from text descriptions. In this paper, we propose a new approach which is both unsupervised and domain independent to extract the attributes. With our approach, we are able to achieve 92% precision and 62% recall in our experiments. Our experiments with varying dataset sizes show the robustness of our algorithm. We also show that even a minimum of 5 descriptions provide enough information to identify attributes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web

The web is a rich resource of structured data. There has been an increasing interest in using web structured data for many applications such as data integration, web search and question answering. In this paper, we present DEXTER, a system to find product sites on the web, and detect and extract product specifications from them. Since product specifications exist in multiple product sites, our ...

متن کامل

An Unsupervised Approach for Product Record Normalization across Different Web Sites

An unsupervised probabilistic learning framework for normalizing product records across different retailer Web sites is presented. Our framework decomposes the problem into two tasks to achieve the goal. The first task aims at extracting attribute values of products from different sites and normalizing them into appropriate reference attributes. This task is challenging because the set of refer...

متن کامل

Entity Attribute Extraction from Unstructured Text with Deep Belief Network

Entity attribute extraction is an extremely challenging research area with broad application prospects. In this paper, we propose a new approach to extract the entities’ attributes from unstructured text corpus that was gathered from Web. The proposed method is an unsupervised machine learning method that extract the entity attributes utilizing DBN. To test the proposed method, we use it to ext...

متن کامل

Generalizing Syntactic Structures for Product Attribute Candidate Extraction

Noun phrases (NP) in a product review are always considered as the product attribute candidates in previous work. However, this method limits the recall of the product attribute extraction. We therefore propose a novel approach by generalizing syntactic structures of the product attributes with two strategies: intuitive heuristics and syntactic structure similarity. Experiments show that the pr...

متن کامل

OPINE: Extracting Product Features and Opinions from Reviews

Consumers have to often wade through a large number of on-line reviews in order to make an informed product choice. We introduce OPINE, an unsupervised, high-precision information extraction system which mines product reviews in order to build a model of product features and their evaluation by reviewers.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009