A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram

Authors

Abstract:

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in countries such as Iran, Venezuela, Nigeria, Kenya, Russia, and Ukraine. This messenger has become a popular and extensively used messenger because it supports various languages and provides diverse services such as creating groups and channels with a large number of users and members. There is a large amount of contextual data in telegram groups containing hidden knowledge; the extraction of this knowledge can be beneficial. The requests in telegram userschr('39') messages are examples of this sort of data with hidden knowledge. Hence, identifying requests can respond to userschr('39') needs and help them fulfill their desires immediately; this drives userschr('39') business development. The authors identified these requests in a telegram search engine named the Idekav system of Yazd University. Then, the authors created opportunities to earn money by sending these requests to the business owners who were able to respond to them. Given the high dimensions of feature space in contextual data, it is necessary to reduce attributes using feature selection.        In the present study, the appropriate features were selected for Persian text classification and request identification. Among the feature selection methods, two local and global filter-based methods were chosen. By general investigation and combining the most extensively used filter-based FS methods, an optimal subset of important features was obtained. This hybrid feature selection method resulted in increased request identification accuracy, improved Persian text classification efficiency, and reduced training time and computation by optimizing the feature reduction. Of course, it is noteworthy that the classification accuracy is reduced in some methods; however, this value is negligible compared to the feature reduction value. Incorporating the concept of opinion mining into the analysis of emotions and questions can be a method to identify positive or negative demand in social networks. Therefore, the requests in the Persian telegram messages can be identified using opinion mining researches. For experiments in the present article, a dataset called Persian is used, which is extracted from the Idekav system. The selection of suitable features to increase model accuracy in request identification is an important part of this research. The support vector machine was employed to calculate accuracy. Given the acceptable results of the SVM, its various kernels were also calculated. Micro-averaging and macro-averaging criteria were also used for evaluation. Model inputs include many optimal feature subsets. Furthermore, feature selection methods have been proposed to produce suitable features for each model for increasing the accuracy of the model. Afterward, among all the features investigated, appropriate features have been selected for each of the applied feature selection models. For a more precise explanation, the main innovations of the present study are as follows: Use of the most common filters based on local and global feature selection methods to find the optimal feature set. Use of hybrid methods to create suitable features for predictive models of accuracy in Persian text classification and their application in identifying requests in Persian messages on telegram. Selecting suitable features to increase accuracy and reduce computational time for each of the models under consideration. In this regard, in addition to picking an efficient algorithm, it is attempted to provide a method for making more appropriate choices. Evaluation and testing of the proposed models for a large set of Persian data and many different features.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

reflections on taught courses of the iranian ma program in english translation: a mixed-methods study

the issue of curriculum and syllabus evaluation and revision has been in center of attention right from when curriculum came into attention of educational institutions. thus everywhere in the world in educational institutions curricula and syllabi are evaluated and revised based on the goals, the needs, existing content, etc.. in iran any curriculum is designed in a committee of specialists and...

the test for adverse selection in life insurance market: the case of mellat insurance company

انتخاب نامساعد یکی از مشکلات اساسی در صنعت بیمه است. که ابتدا در سال 1960، توسط روتشیلد واستیگلیتز مورد بحث ومطالعه قرار گرفت ازآن موقع تاکنون بسیاری از پژوهشگران مدل های مختلفی را برای تجزیه و تحلیل تقاضا برای صنعت بیمه عمر که تماما ناشی از عدم قطعیت در این صنعت میباشد انجام داده اند .وهدف از آن پیدا کردن شرایطی است که تحت آن شرایط انتخاب یا کنار گذاشتن یک بیمه گزار به نفع و یا زیان شرکت بیمه ...

15 صفحه اول

Combination of Feature Selection and Learning Methods for IoT Data Fusion

In this paper, we propose five data fusion schemes for the Internet of Things (IoT) scenario,which are Relief and Perceptron (Re-P), Relief and Genetic Algorithm Particle Swarm Optimization (Re-GAPSO), Genetic Algorithm and Artificial Neural Network (GA-ANN), Rough and Perceptron (Ro-P)and Rough and GAPSO (Ro-GAPSO). All the schemes consist of four stages, including preprocessingthe data set ba...

full text

the relationship between academic self-concept and academic achievement in english and general subjects of the students of high school

according to research, academic self-concept and academic achievement are mutually interdependent. in the present study, the aim was to determine the relationship between the academic self-concept and the academic achievement of students in english as a foreign language and general subjects. the participants were 320 students studying in 4th grade of high school in three cities of noor, nowshah...

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 19  issue 2

pages  175- 196

publication date 2022-09

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023