A Semi-Supervised Bayesian Network Model for Microblog Topic Classification

نویسندگان

  • Yan Chen
  • Zhoujun Li
  • Liqiang Nie
  • Xia Hu
  • Xiangyu Wang
  • Tat-Seng Chua
  • Xiaoming Zhang
چکیده

Microblogging services have brought users to a new era of knowledge dissemination and information seeking. However, the large volume and multi-aspect of messages hinder the ability of users to conveniently locate the specific messages that they are interested in. While many researchers wish to employ traditional text classification approaches to effectively understand messages on microblogging services, the limited length of the messages prevents these approaches from being employed to their full potential. To tackle this problem, we propose a novel semi-supervised learning scheme to seamlessly integrate the external web resources to compensate for the limited message length. Our approach first trains a classifier based on the available labeled data as well as some auxiliary cues mined from the web, and probabilistically predicts the categories for all unlabeled data. It then trains a new classifier using the labels for all messages and the auxiliary cues, and iterates the process to convergence. Our approach not only greatly reduces the time-consuming and labor-intensive labeling process, but also deeply exploits the hidden information from unlabeled data and related text resources. We conducted extensive experiments on two real-world microblogging datasets. The results demonstrate the effectiveness of the proposed approaches which produce promising performance as compared to state-of-the-art methods. Title and abstract in Chinese ÄuŒiÒ “d ä . ‡ÆÌK©a .ïÄ ‡ÆŠǑ˜«#. ¬xN§Ù°þêâ9 yÌK õ 5 ^r‡é a, ÌK ‡Æš~(J" ˜ ïÄ̇æ^¡•© ÷{5©a‡Æ Ì K§ ‡ÆŠǑ˜«á© §ÙDÕ5Ú^ Ø5‰î­KǑ ù { 5U" é‡ Æ A:§ ©JÑ ˜«ŒiÒ “d ä .§Ù¿©|^ ܃' ä℄ 5 ́ L‡Æ © AƧ¿|^ þ ™I5êâ±99Ï Ü℄ 5ýÿŒþ–I5‡Æ êâ ÌK" © {Ø=U~ „¡ <óI5L§§ …U l™I5 ‡Æêâ ±9ƒ'℄ ¥ ÷чÆÛõÌK ƒ'ŠÂ&E" ·‚ ¢ ÄuTwitterÚ#L‡Æ ü‡êâ8§Á (JL2§†8 {ƒ'§ ©Jѐ{ 5UkwÍJp"

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study on Music Genre Recognition and Classification Techniques

Automatic classification of music genre is widely studied topic in music information retrieval (MIR) as it is an efficient method to structure and organize the large numbers of music files available on the Internet. Generally, the genre classification process of music has two main steps: feature extraction and classification. The first step obtains audio signal information, while the second one...

متن کامل

Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk

This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...

متن کامل

Automatic Classification of Unstructured Blog Text

Automatic classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are automatically assigned to one of a set of pre-defined classes based on the features extracted from their textual content. This paper attempts automatic classification of unstructured blog entries by following pre-processing steps like tokenization, stop-word el...

متن کامل

IRIT at TREC Microblog 2012: adhoc Task

This paper describes the participation of the IRIT lab, university of Toulouse, France, to the Microblog Track of TREC 2012. Two different models are experimented by our team for the adhoc task: (i) a Bayesian network retrieval model for tweet search and (ii) a feature learning model for relevance classification. Experimental results show that Bayesian network retrieval model improves the perfo...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012