Multiple weak supervision for short text classification
نویسندگان
چکیده
Abstract For short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. this, we proposed multiple weak supervision, which can label unlabeled automatically. Different from prior work, the method generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify effectiveness of supervision. According experimental results on public dadasets, real datasets synthetic datasets, problem be solved effectively by Notably, without reducing precision , recall F1-score improved adding distant supervision clustering, used meet different application needs.
منابع مشابه
Active Learning Based Weak Supervision for Textual Survey Response Classification
Analysing textual responses to open-ended survey questions has been one of the challenging applications for NLP. Such unstructured text data is a rich data source of subjective opinions about a specific topic or entity; but it is not amenable to quick and comprehensive analysis. Survey coding is the process of categorizing such text responses using a pre-specified hierarchy of classes (often ca...
متن کاملTowards Understanding Situated Text: Concept Labeling and Weak Supervision
Much of the focus of the natural language processing community lies in solving syntactic or semantic tasks with the aid of sophisticated machine learning algorithms and the encoding of linguistic prior knowledge. One of the most important features of natural languages is that their real-world use (as a tool for humans) is to communicate something about our physical reality or metaphysical consi...
متن کاملTransductive LSI for Short Text Classification Problems
This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI’s singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-...
متن کاملHarvesting Parallel Text in Multiple Languages with Limited Supervision
The Web is an ever increasing, dynamically changing, multilingual repository of text. There have been several approaches to harvest this repository for bootstrapping, supplementing and adapting data needed for training models in speech and language applications. In this paper, we present semi-supervised and unsupervised approaches to harvesting multilingual text that rely on a key observation o...
متن کاملNew Method for Sentiment Classification for Short Text
With the rapid development of the Internet, the microblog platform, BBS, e-Commerce etc. gathered a lot of short messages/text, which contained subjective sentences. These sentences often had obvious inclination which reflected the sentiment of the author. By mining the author’s sentiment, such as like, angry, indignation, averseness, etc., we can analyze people’s opinion for some policy, peopl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Intelligence
سال: 2022
ISSN: ['0924-669X', '1573-7497']
DOI: https://doi.org/10.1007/s10489-021-02958-3