Data expansion using back translation and paraphrasing for hate speech detection
نویسندگان
چکیده
Abstract With proliferation of user generated contents in social media platforms, establishing mechanisms to automatically identify toxic and abusive content becomes a prime concern for regulators, researchers, society. Keeping the balance between freedom speech respecting each other dignity is major platform regulators. Although, automatic detection offensive using deep learning approaches seems provide encouraging results, training learning-based models requires large amounts high-quality labeled data, which often missing. In this regard, we present paper new method that fuses Back Translation method, Paraphrasing technique data augmentation. Our pipeline investigates different word-embedding-based architectures classification hate speech. The back translation relies on an encoder–decoder architecture pre-trained corpus mostly used machine translation. addition, paraphrasing exploits transformer model mixture experts generate diverse paraphrases. Finally, LSTM, CNN are compared seek enhanced results. We evaluate our proposal five publicly available datasets; namely, AskFm corpus, Formspring dataset, Warner Waseem Olid, Wikipedia comments dataset. performance together with comparison some related state-of-art results demonstrate effectiveness soundness proposal.
منابع مشابه
Hate Me, Hate Me Not: Hate Speech Detection on Facebook
While favouring communications and easing information sharing, Social Network Sites are also used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement to self-harm practices, sexual predation are just some of the severe effects of massive online offensives. Moreover, attacks can be carried out against groups of victims and can degenerate in physical viol...
متن کاملThe Circle of Meaning: from Translation to Paraphrasing and Back
Title of dissertation: THE CIRCLE OF MEANING: FROM TRANSLATION TO PARAPHRASING AND BACK Nitin Madnani, Doctor of Philosophy, 2010 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this ...
متن کاملA Survey on Hate Speech Detection using Natural Language Processing
This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language proc...
متن کاملmetrics for the detection of changed buildings in 3d old vector maps using als data (case study: isfahan city)
هدف از این تحقیق، ارزیابی و بهبود متریک های موجود جهت تایید صحت نقشه های قدیمی سه بعدی برداری با استفاده از ابر نقطه حاصل از لیزر اسکن جدید شهر اصفهان می باشد . بنابراین ابر نقطه حاصل از لیزر اسکنر با چگالی حدودا سه نقطه در هر متر مربع جهت شناسایی عوارض تغییر کرده در نقشه های قدیمی سه بعدی استفاده شده است. تمرکز ما در این تحقیق بر روی ساختمان به عنوان یکی از اصلی ترین عارضه های شهری می باشد. من...
Paraphrasing and Translation
Usefulness of paraphrases • Paraphrases are alternative ways of conveying the same information • Useful in NLP application such as: – Generation producing paraphrases allows for the creation of more varied and fluent text – Multidocument summarization identifying paraphrases allows information repeated across documents to be condensed – Question answering paraphrasing is important when going be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Online Social Networks and Media
سال: 2021
ISSN: ['2468-6964']
DOI: https://doi.org/10.1016/j.osnem.2021.100153