Three language political leaning text classification using natural language processing methods

نویسندگان

چکیده

In this article, the problem of political leaning classificationof text resource is solved. First, a detailed analysis ten stud-ies on work’s topicwas performed in form comparative characteristicsof used methodologies.Literary sources were compared according to problem-solvingmethods,the learning that was carried out, evaluation metrics, and vectorizations.Thus, it determined machine algorithms neural networks, as well vectorizationmethods TF-IDF Word2Vec, most often solve problem.Next, various classification models whether textual information pro-Ukrainian or pro-Russian built based dataset containing messages from social media users about events large-scale Russian invasion Ukraine February 24, 2022.The solved with help Support Vector Machines, Decision Tree, Random Forest, Naïve Bayes classifier,eXtreme Gradient BoostingandLogistic Regressionmachine algo-rithms, Convolutional Neural Networks, Long short-term memory BERT techniques for working unbal-anced dataRandom Oversampling, Undersampling , SMOTE SMOTETomek, stacking ensembles models.Amongthe algorithms, LR best, showing macro F1-scorevalue 0.7966 when features trans-formed by vectorization 0.7933 BoW.Among best 0.76was ob-tained using CNN LSTM.Applying data balancing failed improve results algorithms.Next, determined. Two constructed achieved same LR. Ensembles wasable do so consisted vectorization, B-NBC meta-model, SVC, NuSVC LR, base models, respectively.Thus, three classifiers, algorithmand two which defined combination existing methods solving problem, demon-strated largest F1-score value 0.7966. The obtained can be review news publica-tions characteristic, people identify being isolated filter bubble.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Classification using Language-independent Pre-processing

A number of language-independent text pre-processing techniques, to support multi-class single-label text classification, are described and compared. A simple but effective statistical keyword identification approach is proposed, coupled with a number of phrase identification mechanisms. Experimental results are presented.

متن کامل

Natural Language Processing Methods for Automatic Illustration of Text

The thesis describes methods for automatic creation of illustrations of natural-language text. The main focus of the work is to convert texts that describe sequences of events in a physical world into animated images. This is what we call text-to-scene conversion. The first part of the thesis describes Carsim, a system that automatically illustrates traffic accident newspaper reports written in...

متن کامل

Web Text Corpus for Natural Language Processing

Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...

متن کامل

Investigating classification for natural language processing tasks

This thesis investigates the application of classification techniques to four natural language processing (NLP) tasks. The classification paradigm falls within the family of statistical and machine learning (ML) methods and consists of a framework within which a mechanical ‘learner’ induces a functional mapping between elements drawn from a particular sample space and a set of designated target...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied aspects of information technologies

سال: 2022

ISSN: ['2617-4316', '2663-7723']

DOI: https://doi.org/10.15276/aait.05.2022.24