Aligning Comments to News Articles on a Budget

نویسندگان

چکیده

Disagreement among text annotators as a part of human (expert) labeling process produces noisy labels, which affect the performance supervised learning algorithms for natural language processing. Using only high agreement annotations introduces another challenge: data imbalance problem. We study this challenge within problem relating user comments to content news article. show that traditional techniques from imbalanced data, such oversampling, using weighted loss functions, or assigning weak labels crowdsourcing, may not be sufficient modeling complex temporal relationships between articles and comments. In study, we propose framework aligning (1) imbalanced data characterized with (2) different degrees xmlns:xlink="http://www.w3.org/1999/xlink">annotator agreement , under (3) xmlns:xlink="http://www.w3.org/1999/xlink">constrained budget computing resources. Within framework, Semi-Automatic Labeling solution based on Human-AI collaboration. compare our proposed technique handling synthetic generation xmlns:xlink="http://www.w3.org/1999/xlink">article-comment alignment problem where goal is determine category an article-comment pair represents how relevant comment Finding effective efficient essential because it time-consuming prohibitively costly manually label sufficiently large amount pairs semantic understanding article its discover collaboration outperforms all alternative by 17% accuracy. When there no time budget re-labeling some pairs, found synonym augmentation reasonable alternative. also provide detailed analysis effect humans in loop use unlabeled data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diversifying User Comments on News Articles

In this paper we present an approach for diversifying user comments on news articles. In our proposed framework, we analyse user comments w.r.t. four different criteria in order to extract the respective diversification dimensions in the form of feature vectors. These criteria involve content similarity, sentiment expressed within comments, article’s named entities also found within comments an...

متن کامل

Detecting Comments on News Articles in Microblogs

A reader of a news article would often be interested in the comments of other readers on anarticle, because comments give insight into popular opinions or feelings toward a given piece of news. In recent years, social media platforms, such as Twitter, have become a social hub for users to communicate and express their thoughts. This includes sharing news articles and commenting on them. In this...

متن کامل

Reliable Measures for Aligning Japanese-English News Articles and Sentences

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignm...

متن کامل

Understanding Public Perceptions of the HPV Vaccination Based on Online Comments to Canadian News Articles

BACKGROUND Given the variation in human papillomavirus (HPV) vaccine coverage across Canada, and debate regarding delivery of HPV vaccines in Catholic schools, we studied online comments on Canadian news websites to understand public perceptions of HPV and HPV vaccine. METHODS We searched English- and French-language Canadian news websites for 2012 articles that contained the terms "HPV" or "...

متن کامل

Responding to reviewers' comments on submitted articles.

The letter from the editor generally comes in one of 4 flavors. First, a manuscript may be accepted without any changes. If this happens to you, count yourself lucky; such an editorial response is rare. In our experience, this has happened only once for each of us. Second, the manuscript may be accepted with suggestions for minor revisions. Again, count your blessings, quickly make the suggeste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3247948