Duplicate Question Pair Detection with Deep Learning

نویسنده

  • Travis Addair
چکیده

Determining whether two questions are asking the same thing can be challenging, as word choice and sentence structure can vary significantly. Traditional natural language processing techniques such as shingling have been found to have limited success in separating related question from duplicate questions. Using a dataset of 400,000 labeled question pairs provided by question-and-answer forum Quora, we explore a series of deep learning methodologies for detecting duplicate question pairs: convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and a hybrid model. All three models are built atop a siamese network architecture and multilayer perceptron concatenation for the final inference. Our empirical results show that LSTMs outperform CNNs in terms of accuracy, and that combining the two techniques provides no additional inference improvements. Moreover, all three deep learning techniques significantly outperform traditional NLP methods and simple multilayer perceptron baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules

Programming community-based question-answering (PCQA) websites such as Stack Overflow enable programmers to find working solutions to their questions. Despite detailed posting guidelines, duplicate questions that have been answered are frequently created. To tackle this problem, Stack Overflow provides a mechanism for reputable users to manually mark duplicate questions. This is a laborious eff...

متن کامل

Semantic Duplicate Identification with Parsing and Machine Learning

Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are deriv...

متن کامل

Detection of children's activities in smart home based on deep learning approach

 Monitoring behavior of children in the home is the extremely important to avoid the possible injuries. Therefore, an automated monitoring system for monitoring behavior of children by researchers has been considered. The first step for designing and executing an automated monitoring system on children's behavior in closed spaces is possible with recognize their activity by the sensors in the e...

متن کامل

Detection of children's activities in smart home based on deep learning approach

 Monitoring behavior of children in the home is the extremely important to avoid the possible injuries. Therefore, an automated monitoring system for monitoring behavior of children by researchers has been considered. The first step for designing and executing an automated monitoring system on children's behavior in closed spaces is possible with recognize their activity by the sensors in the e...

متن کامل

ECNU at SemEval-2016 Task 3: Exploring Traditional Method and Deep Learning Method for Question Retrieval and Answer Ranking in Community Question Answering

This paper describes the system we submitted to the task 3 (Community Question Answering) in SemEval 2016, which contains three subtasks, i.e., Question-Comment Similarity (subtask A), Question-Question Similarity (subtask B), and Question-External Comment Similarity (subtask C). For subtask A, we employed three different methods to rank question-comment pair, i.e., supervised model using tradi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017