EasyChoose: A Continuous Feature Extraction and Review Highlighting Scheme on Hadoop YARN
نویسنده
چکیده
Today the Internet offers a massive amount of reviews and user experiences about a variety of products from different manufacturers, ranging from smartphones, automobiles, and home appliances to Internet services such as hotel booking and airplane booking. For a careful customer it is time-consuming to make good purchasing decisions due to a variety of similar products, lots of reviews for each product, and distributed reviews on the Internet. To alleviate this situation, this paper proposes EasyChoose, which is a distributed scheme based on Hadoop YARN to continuously collect product reviews from the Internet, extract representative product features based on previous customers’ reviews, and highlight the main point of the reviews. In this paper, we use online hotel booking as an example to demonstrate the effectiveness of EasyChoose. The results show that EasyChoose is able to automatically extract representative product features and highlight reviews without losing the original meanings. Furthermore, EasyChoose is able to continuously provide such service to keep up with changes in recent customers’ reviews. Keywords—feature extraction; review highlight; customer reviews; recommendation; distributed system; Hadoop YARN
منابع مشابه
Survey on Hadoop and Introduction to YARN
Big Data, the analysis of large quantities of data to gain new insight has become a ubiquitous phrase in recent years. Day by day the data is growing at a staggering rate. One of the efficient technologies that deal with the Big Data is Hadoop, which will be discussed in this paper. Hadoop, for processing large data volume jobs uses MapReduce programming model. Hadoop makes use of different sch...
متن کاملABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters
In cloud computing, software which does not flexibly adapt to deployment decisions either wastes operational resources or requires reengineering, both of which may significantly increase costs. However, this could be avoided by analyzing deployment decisions already during the design phase of the software development. Real-Time ABS is a formal language for executable modeling of deployed virtua...
متن کاملA review on EEG based brain computer interface systems feature extraction methods
The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...
متن کاملA review on EEG based brain computer interface systems feature extraction methods
The brain – computer interface (BCI) provides a communicational channel between human and machine. Most of these systems are based on brain activities. Brain Computer-Interfacing is a methodology that provides a way for communication with the outside environment using the brain thoughts. The success of this methodology depends on the selection of methods to process the brain signals in each pha...
متن کاملHadoop neural network for parallel and distributed feature selection
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various ...
متن کامل