Transfer Latent Semantic Learning: Microblog Mining with Less Supervision

نویسندگان

  • Dan Zhang
  • Yan Liu
  • Richard D. Lawrence
  • Vijil Chenthamarakshan
چکیده

The increasing volume of information generated on microblogging sites such as Twitter raises several challenges to traditional text mining techniques. First, most texts from those sites are abbreviated due to the constraints of limited characters in one post; second, the input usually comes in streams of large-volumes. Therefore, it is of significant importance to develop effective and efficient representations of abbreviated texts for better filtering and mining. In this paper, we introduce a novel transfer learning approach, namely transfer latent semantic learning, that utilizes a large number of related tagged documents with rich information from other sources (source domain) to help build a robust latent semantic space for the abbreviated texts (target domain). This is achieved by simultaneously minimizing the document reconstruction error and the classification error of the labeled examples from the source domain by building a classifier with hinge loss in the latent semantic space. We demonstrate the effectiveness of our method by applying them to the task of classifying and tagging abbreviated texts. Experimental results on both synthetic datasets and real application datasets, including Reuters-21578 and Twitter data, suggest substantial improvements using our approach over existing ones.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Microblog Topic Detection Based on the Latent Semantic Analysis and Structural Property

traditional topic detection method can not be applied to the microblog topic detection directly, because the microblog text is a kind of the short, fractional and grass-roots text. In order to detect the hot topic in the microblog text effectively, we propose a microblog topic detection method based on the combination of the latent semantic analysis and the structural property. According to the...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Deep learning for constructing microblog behavior representation to identify social media user's personality

Due to the rapid development of information technology, the Internet has gradually become a part of everyday life. People would like to communicate with friends to share their opinions on social networks. The diverse behavior on socials networks is an ideal reflection of users’ personality traits. Existing behavior analysis methods for personality prediction mostly extract behavior attributes w...

متن کامل

Biomedical Text Mining: State-of-the-Art, Open Problems and Future Challenges

Text is a very important type of data within the biomedical domain. For example, patient records contain large amounts of text which has been entered in a non-standardized format, consequently posing a lot of challenges to processing of such data. For the clinical doctor the written text in the medical findings is still the basis for decision making – neither images nor multimedia data. However...

متن کامل

Comment Data Mining to Estimate Student Performance Considering Consecutive Lessons

The purpose of this study is to examine different formats of comment data to predict student performance. Having students write comment data after every lesson can reflect students’ learning attitudes, tendencies and learning activities involved with the lesson. In this research, Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (pLSA) are employed to predict student ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011