Graph-Community Detection for Cross-Document Topic Segment Relationship Identification

نویسندگان

  • Pedro Mota
  • Maxine Eskénazi
  • Luísa Coheur
چکیده

In this paper we propose a graph-community detection approach to identify cross-document relationships at the topic segment level. Given a set of related documents, we automatically find these relationships by clustering segments with similar content (topics). In this context, we study how different weighting mechanisms influence the discovery of word communities that relate to the different topics found in the documents. Finally, we test different mapping functions to assign topic segments to word communities, determining which topic segments are considered equivalent. By performing this task it is possible to enable efficient multi-document browsing, since when a user finds relevant content in one document we can provide access to similar topics in other documents. We deploy our approach in two different scenarios. One is an educational scenario where equivalence relationships between learning materials need to be found. The other consists of a series of dialogs in a social context where students discuss commonplace topics. Results show that our proposed approach better discovered equivalence relationships in learning material documents and obtained close results in the social speech domain, where the best performing approach was a clustering technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Identification

In this chapter we discuss the problem of identifying the underlying topics beings discussed in spoken audio recordings. We focus primarily on the issues related to supervised topic classification or detection tasks using labeled training data, but we also discuss approaches for other related tasks including novel topic detection and unsupervised topic clustering. The chapter provides an overvi...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Graph-based Visual Saliency Model using Background Color

Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map i...

متن کامل

Semantic-based topic detection using Markov decision processes

In the field of text mining, topic modeling and detection are fundamental problems in public opinion monitoring, information retrieval, social media analysis, and other activities. Document clustering has been used for topic detection at the document level. Probabilistic topic models treat topics as a distribution over the term space, but this approach overlooks the semantic information hidden ...

متن کامل

An Optimized Firefly Algorithm based on Cellular Learning Automata for Community Detection in Social Networks

The structure of the community is one of the important features of social networks. A community is a sub graph which nodes have a lot of connections to nodes of inside the community and have very few connections to nodes of outside the community. The objective of community detection is to separate groups or communities that are linked more closely. In fact, community detection is the clustering...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1606.04081  شماره 

صفحات  -

تاریخ انتشار 2016