Automatically Detecting Corresponding Edit-Turn-Pairs in Wikipedia

نویسندگان

  • Johannes Daxenberger
  • Iryna Gurevych
چکیده

In this study, we analyze links between edits in Wikipedia articles and turns from their discussion page. Our motivation is to better understand implicit details about the writing process and knowledge flow in collaboratively created resources. Based on properties of the involved edit and turn, we have defined constraints for corresponding edit-turn-pairs. We manually annotated a corpus of 636 corresponding and non-corresponding edit-turn-pairs. Furthermore, we show how our data can be used to automatically identify corresponding edit-turn-pairs. With the help of supervised machine learning, we achieve an accuracy of .87 for this task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Transliterations from Wikipedia using Dynamic Bayesian Networks

Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...

متن کامل

Edit Categories and Editor Role Identification in Wikipedia

In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically iden...

متن کامل

Automatically Classifying Edit Categories in Wikipedia Revisions

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machin...

متن کامل

Novel Balanced Feature Representation for Wikipedia Vandalism Detection Task - Lab Report for PAN at CLEF 2010

In online communities, like Wikipedia, where content edition is available for every visitor users who deliberately make incorrect, vandal comments are sure to turn up. In this paper we propose a strong feature set and a method that can handle this problem and automatically decide whether an edit is a vandal contribution or not. We present a new feature set that is a balanced and extended versio...

متن کامل

Learning to Predict the Quality of Contributions to Wikipedia

Although some have argued that Wikipedia’s open edit policy is one of the primary reasons for its success, it also raises concerns about quality — vandalism, bias, and errors can be problems. Despite these challenges, Wikipedia articles are often (perhaps surprisingly) of high quality, which many attribute to both the dedicatedWikipedia community and “good Samaritan” users. As Wikipedia continu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014