Text Data Labelling using Transformer based Sentence Embeddings and Text Similarity for Text Classification

نویسندگان

چکیده

This paper demonstrates that a lot of time, cost, and complexities can be saved avoided would otherwise used to label the text data for classification purposes. The AI world realizes importance labelled its use various NLP applications. Here, we have categorized close 6,000 unlabelled samples into five distinct classes. dataset was further multi-class classification. Data labelling task using transformer-based sentence embeddings applying cosine-based similarity threshold 20-30 days human efforts multiple validations with 98.4% classes correctly as per business validation. Text results obtained this fetched accuracy score F1 90%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Text Summarization Using Sentence Embeddings

Dense vector representations of words, and more recently, sentences, have been shown to improve performance in a number of NLP tasks. We propose a method to perform unsupervised extractive and abstractive text summarization using sentence embeddings. We compare multiple variants of our systems on two datasets, show substantially improved performance over a simple baseline, and performance appro...

متن کامل

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal on natural language computing

سال: 2022

ISSN: ['2278-1307', '2319-4111']

DOI: https://doi.org/10.5121/ijnlc.2022.11201