Unsupervised Solution Post Identification from Discussion Forums

نویسندگان

  • Deepak P
  • Karthik Visweswariah
چکیده

Discussion forums have evolved into a dependable source of knowledge to solve common problems. However, only a minority of the posts in discussion forums are solution posts. Identifying solution posts from discussion forums, hence, is an important research problem. In this paper, we present a technique for unsupervised solution post identification leveraging a so far unexplored textual feature, that of lexical correlations between problems and solutions. We use translation models and language models to exploit lexical correlations and solution post character respectively. Our technique is designed to not rely much on structural features such as post metadata since such features are often not uniformly available across forums. Our clustering-based iterative solution identification approach based on the EM-formulation performs favorably in an empirical evaluation, beating the only unsupervised solution identification technique from literature by a very large margin. We also show that our unsupervised technique is competitive against methods that require supervision, outperforming one such technique comfortably.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised deep semantic and logical analysis for identification of solution posts from community answers

These days’ discussion forums provide dependable solutions to the problems related to multiple domains and areas. However, due to the presence of huge amount of less-informative/inappropriate posts, the identification of the appropriate problem-solution pairs has become a challenging task. The emergence of a variety of topics, domains and areas has made the task of manual labelling of the probl...

متن کامل

Semi-supervised and Unsupervised Methods for Categorizing Posts in Web Discussion Forums

Semi-supervised and unsupervised methods for categorizing posts in web discussion forums Krish Perumal Master of Science Graduate Department of Computer Science University of Toronto 2016 Web discussion forums are used by millions of people worldwide to share information belonging to a variety of domains such as automotive vehicles, pets, sports, etc. They typically contain posts that fall into...

متن کامل

Semi-supervised and unsupervised categorization of posts in Web discussion forums using part-of-speech information and minimal features

Web discussion forums typically contain posts that fall into different categories such as question, solution, feedback, spam, etc. Automatic identification of these categories can aid information retrieval that is tailored for specific user requirements. Previously, a number of supervised methods have attempted to solve this problem; however, these depend on the availability of abundant trainin...

متن کامل

Semi-automatic Information Extraction from Discussion Boards with Applications for Anti-Spam Technology

Forums (or discussion boards) represent a huge information collection structured under different boards, threads and posts. The actual information entity of a forum is a post, which has the information about authors, date and time of post, actual content etc. This information is significant for a number of applications like gathering market intelligence, analyzing customer perceptions etc. Howe...

متن کامل

A Probabilistic Model for MOOC Discussion Forums

Students learning socially is a critical aspect of scaling up instruction in online education. In many cases, such as in massive open online courses (MOOCs), social learning is facilitated through discussion forums hosted by course providers. In this paper, we propose a probabilistic model for the process of learners posting on such forums, using point processes. Different from existing works, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014