Automatic Moderation of Comments in a Large On-line Journalistic Environment

نویسندگان

  • Adriano Veloso
  • Wagner Meira
  • Tiago Alves Macambira
  • Dorgival O. Guedes
  • Hélio Marcos Paz de Almeida
چکیده

On-line journalistic sites publish several news and stories every day. Readers of these sites may comment a story, and, as a consequence, a single story might receive thousands of comments. The quality of these comments may vary a lot, from spams and trolls to truly useful information. Separating good from bad comments is an important task, and is the primary goal of comment moderation. Moderators usually classify and score the comments, promoting high quality ones and, likewise, discouraging low quality ones. However, moderators usually face a very large number of comments, and thus, moderation may require a huge amount of time. In this paper we address the problem of automatic moderation of comments in a large journalistic Web site. Participants of the site may engage in discussions and interact with each other (i.e., friends, fans, enemies etc.), constituting a large social network. We propose a data mining technique which combines underlying patterns that are implicit in the content of the comments with patterns hidden in the social network, and then uses the result for classification and scoring purposes. We evaluate our proposed technique using a real collection of comments collected from the Slashdot forum. The proposed technique is effective, outperforming traditional approaches, such as decision trees and SVMs, in terms of accuracy. Further, the technique shows to be extremely fast, being able to moderate hundreds of comments per minute.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning for User Comment Moderation

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully aut...

متن کامل

Deeper Attention to Abusive User Content Moderation

Experimenting with a new dataset of 1.6M user comments from a news portal and an existing dataset of 115K Wikipedia talk page comments, we show that an RNN operating on word embeddings outpeforms the previous state of the art in moderation, which used logistic regression or an MLP classifier with character or word n-grams. We also compare against a CNN operating on word embeddings, and a word-l...

متن کامل

Evaluation of Genotype × Environment Interaction and Grain Yield Stability of Advanced Bread Wheat Cross-bred lines by GGE Biplot Method

Extended Abstract Introduction and Objective: Investigation of the interaction of genotype × environment and identification of stable and high yielding cultivars in different environmental conditions is of great importance in plant breeding. The objectives of this study are to investigate the interaction of genotype × environment using GGE bilpot graphic method in advanced cross-breeding lines...

متن کامل

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

Analysis of Wind Speed Forecasting Error Effects on Automatic Generation Control Performance

The main goal of this paper is to study statistical indices and evaluate AGC indices in power system which has large penetration of the WTGs. Increasing penetration of wind turbine generations, needs to study more about impacts of it on power system frequency control. Frequency control is changed with unbalancing real-time system generation and load . Also wind turbine generations have more flu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007