Building Automated Vandalism Detection Tools for Wikidata
نویسندگان
چکیده
Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open collaboration model is powerful in that it reduces barriers to participation and allows a large number of people to contribute. However, it exposes the knowledge base to the risk of vandalism and low-quality contributions. In this work, we build on past work detecting vandalism in Wikipedia to detect vandalism in Wikidata. This work is novel in that identifying damaging changes in a structured knowledge-base requires substantially different feature engineering work than in a text-based wiki like Wikipedia. We also discuss the utility of these classifiers for reducing the overall workload of vandalism patrollers in Wikidata. We describe a machine classification strategy that is able to catch 89% of vandalism while reducing patrollers’ workload by 98%, by drawing lightly from contextual features of an edit and heavily from the characteristics of the user making the edit.
منابع مشابه
Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017
We report on the Wikidata vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata vandalism detection as an online learning problem, requiring participant software to predict vandalism in near real-time. The best-performing approach a...
متن کاملWikidata Vandalism Detection - The Loganberry Vandalism Detector at WSDM Cup 2017
Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. As it can be edited by anyone, entries frequently get vandalized, leading to the possibility that it might spread of falsified information if such posts are not detected. The WSDM 2017 Wiki Vandalism Detection Challenge requires us to solve this problem by computing a vandalism score denoting the likelihood that a revi...
متن کاملEnsemble Models for Detecting Wikidata Vandalism with Stacking - Team Honeyberry Vandalism Detector at WSDM Cup 2017
The WSDM Cup 2017 is a binary classification task for classifying Wikidata revisions into vandalism and non-vandalism. This paper describes our method using some machine learning techniques such as under-sampling, feature selection, stacking and ensembles of models. We confirm the validity of each technique by calculating AUC-ROC of models using such techniques and not using them. Additionally,...
متن کاملA Production Oriented Approach for Vandalism Detection in Wikidata - The Buffaloberry Vandalism Detector at WSDM Cup 2017
Wikidata is a free and open knowledge base from the Wikimedia Foundation, that not only acts as a central storage of structured data for other projects of the organization, but also for a growing array of information systems, including search engines. Like Wikipedia, Wikidata’s content can be created and edited by anyone; which is the main source of its strength, but also allows for malicious u...
متن کاملTowards Automatic Vandalism Detection in OpenStreetMap
The OpenStreetMap (OSM) project, a well-known source of freely available worldwide geodata collected by volunteers, has experienced a consistent increase in popularity in recent years. One of the main caveats that is closely related to this popularity increase is different types of vandalism that occur in the projects database. Since the applicability and reliability of crowd-sourced geodata, a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017