Real-time News Recommendations using Apache Spark
نویسندگان
چکیده
Recommending news articles is a challenging task due to the continuous changes in the set of available news articles and the contextdependent preferences of users. Traditional recommender approaches are optimized for analyzing static data sets. In news recommendation scenarios, characterized by continuous changes, high volume of messages, and tight time constraints, alternative approaches are needed. In this work we present a highly scalable recommender system optimized for the processing of streams. We evaluate the system in the CLEF NewsREEL challenge. Our system is built on Apache Spark enabling the distributed processing of recommendation requests ensuring the scalability of our approach. The evaluation of the implemented system shows that our approach is suitable for the news recommenation scenario and provides high-quality results while satisfying the tight time constraints.
منابع مشابه
Document Classification Using Distributed Machine Learning
In this paper, we investigate the performance and success rates of Naïve Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data technologies such as Hadoop, HDFS, Spark and Mahout, and apply these distributed technologies to Machine Learning. Keywords—news classification, distributed machin...
متن کاملDevelopment of a News Recommender System based on Apache Flink
The amount of data on the web is constantly growing. The separation of relevant from less important information is a challenging task. Due to the huge amount of data available in the World Wide Web, the processing cannot be done manually. Software components are needed that learn the user preferences and support users in finding the relevant information. In this work we present our recommender ...
متن کاملReal-time Text Analytics Pipeline Using Open-source Big Data Tools
Real-time text processing systems are required in many domains to quickly identify patterns, trends, sentiments, and insights. Nowadays, social networks, e-commerce stores, blogs, scientific experiments, and server logs are main sources generating huge text data. However, to process huge text data in real time requires building a data processing pipeline. The main challenge in building such pip...
متن کاملA System for Online News Recommendations in Real-Time with Apache Mahout
With the ubiquitous access to the internet, news portals have become heavily consumed online services. The huge amount of published news makes it difficult for users to find relevant articles. Recommender systems have been developed for supporting users in finding the most interesting items in vast collections of available items. In contrast to traditional recommender systems, news recommender ...
متن کاملReal-Time Analysis of Students’ Activities on an
Real time analytics is the capacity to extract valuables insights from data that comes continuously from activities on the web or network sensors. It is largely used in web based business to drive decisions based on user’s experiences, such dynamic pricing and personalized advertising. Many universities have adopted web based learning in their learning process. They use data-mining techniques t...
متن کامل