Development of computational linguistic resources for automated detection of textual cyberbullying threats in Roman Urdu language

نویسندگان

چکیده

Automatic Cyberbullying detection has remained very challenging task since social media content and conversations are usually posted in unstructured free-text form leaving behind the language norms. The major concern gap formulating cyberbullying strategies is scarcity of available linguistic resources typically for newly evolved languages. Roman Urdu recently emerged hence a resource poor language. been widely known as national Pakistan. However, because socio-cultural multilingual aspects, used on Internet by Asians and more specifically Pakistanis. To fulfil above stated gap, this research work presents guidelines data annotation process developed two resources: (i) Annotated corpus Language cyberaggression offensive detection. involved bilingual annotators instead of crowdsourcing. It benefit correctly annotating instances that constitute clear cases cyberbullying without compromising quality. developed highly balanced (with almost negligible skew) unlike most existing corpuses even mature (ii) Processing textual information NLP tasks involves Stop-word elimination sub phase. Stop words carry least semantic increase feature space compared to other tokens index terms corpora. We have domain specific stop considering all lexical variants context aggression collected data. been carried out using python programming Pycharm IDE.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resources for Urdu Language Processing

Urdu is spoken by more than 100 million speakers. This paper summarizes the corpus and lexical resources being developed for Urdu by the CRULP, in Pakistan.

متن کامل

development of feminist poetics in adrienne rich

اشعار ریچ، به عنوان اشعاری که همیشه در حال تغییر و دگرگونی هستند، تجسمی از رشد و دگرگونیِ انسان هستد. پایان نامه ی حاضر، با تمرکز بر روی مراحل سیر شعری ریچ از تغییری در دنیا به عکس هایی فوری از یک عروس، سپس به شیرجه به درون کشتی شکسته و در نهایت به صبری عجیب مرا تا اینجا آورده، به بررسی این مراحل در قالب نظر شوالتر در رابطه با سه مرحله ی پیشرفت ادبی زنان یعنی مرحله ی زنانه، زن گرا و زن محور می پ...

15 صفحه اول

development and implementation of an optimized control strategy for induction machine in an electric vehicle

in the area of automotive engineering there is a tendency to more electrification of power train. in this work control of an induction machine for the application of electric vehicle is investigated. through the changing operating point of the machine, adapting the rotor magnetization current seems to be useful to increase the machines efficiency. in the literature there are many approaches wh...

15 صفحه اول

Modeling the Detection of Textual Cyberbullying

The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are m...

متن کامل

‏‎interpersonal function of language in subtitling

‏‎translation as a comunicative process is always said to be associated with various aspects of meaning loss or gain. subtitling as a mode of translating, due to special discoursal and textual conditions imposed upon it, is believed to be an obvious case of this loss or gain. presenting the spoken sound track of a film in writing and synchronizing the perception of this text by the viewers with...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: 3C TIC

سال: 2021

ISSN: ['2254-6529']

DOI: https://doi.org/10.17993/3ctic.2021.102.101-121