Three language political leaning text classification using natural language processing methods
نویسندگان
چکیده
In this article, the problem of political leaning classificationof text resource is solved. First, a detailed analysis ten stud-ies on work’s topicwas performed in form comparative characteristicsof used methodologies.Literary sources were compared according to problem-solvingmethods,the learning that was carried out, evaluation metrics, and vectorizations.Thus, it determined machine algorithms neural networks, as well vectorizationmethods TF-IDF Word2Vec, most often solve problem.Next, various classification models whether textual information pro-Ukrainian or pro-Russian built based dataset containing messages from social media users about events large-scale Russian invasion Ukraine February 24, 2022.The solved with help Support Vector Machines, Decision Tree, Random Forest, Naïve Bayes classifier,eXtreme Gradient BoostingandLogistic Regressionmachine algo-rithms, Convolutional Neural Networks, Long short-term memory BERT techniques for working unbal-anced dataRandom Oversampling, Undersampling , SMOTE SMOTETomek, stacking ensembles models.Amongthe algorithms, LR best, showing macro F1-scorevalue 0.7966 when features trans-formed by vectorization 0.7933 BoW.Among best 0.76was ob-tained using CNN LSTM.Applying data balancing failed improve results algorithms.Next, determined. Two constructed achieved same LR. Ensembles wasable do so consisted vectorization, B-NBC meta-model, SVC, NuSVC LR, base models, respectively.Thus, three classifiers, algorithmand two which defined combination existing methods solving problem, demon-strated largest F1-score value 0.7966. The obtained can be review news publica-tions characteristic, people identify being isolated filter bubble.
منابع مشابه
Text Classification using Language-independent Pre-processing
A number of language-independent text pre-processing techniques, to support multi-class single-label text classification, are described and compared. A simple but effective statistical keyword identification approach is proposed, coupled with a number of phrase identification mechanisms. Experimental results are presented.
متن کاملNatural Language Processing Methods for Automatic Illustration of Text
The thesis describes methods for automatic creation of illustrations of natural-language text. The main focus of the work is to convert texts that describe sequences of events in a physical world into animated images. This is what we call text-to-scene conversion. The first part of the thesis describes Carsim, a system that automatically illustrates traffic accident newspaper reports written in...
متن کاملWeb Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...
متن کاملInvestigating classification for natural language processing tasks
This thesis investigates the application of classification techniques to four natural language processing (NLP) tasks. The classification paradigm falls within the family of statistical and machine learning (ML) methods and consists of a framework within which a mechanical ‘learner’ induces a functional mapping between elements drawn from a particular sample space and a set of designated target...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied aspects of information technologies
سال: 2022
ISSN: ['2617-4316', '2663-7723']
DOI: https://doi.org/10.15276/aait.05.2022.24