Leveraging Collection Structure in Information Retrieval With Applications to Search in Conversational Social Media
نویسندگان
چکیده
Social media collections are becoming increasingly important in the everyday life of Internet users. Recent statistics show that sites hosting social media and community-generated content account for five of the top ten most visited websites in the United States [4], are visited regularly by a broad cross-section of Internet users [61, 67, 115] and host an enormous quantity of information [119, 48, 9]. The increasing importance and size of these collections requires that information retrieval systems pay special attention to these collections, and in particular pay attention to those aspects of social media collections that set them apart from the general web. Social media collections are interesting and challenging from the perspective of information retrieval systems. These collections are dynamic, with content being constantly added, removed and modified. These collections are time-sensitive, with the most recently added content often viewed as the most significant. These collections are richly structured, with authorship information, often threading structure and higher-level topical classifications. Although this type of collection structure is frequently critical for comprehension, it is rarely exploited in retrieval algorithms. This thesis investigates the hypothesis that we can improve retrieval performance in these collections by leveraging this type of structure. To evaluate this hypothesis, we present an exploration of search in several social media collections: blogs and online forums. We demonstrate the utility of leveraging collection structure in three different retrieval tasks: blog post search, blog feed search, and forum thread search. The techniques explored throughout these experiments include evaluating the representation granularity of collections of documents, and methods to incorporate content an author has written throughout the collection. Our results show that, although the retrieval tasks and techniques to leverage this type of collection structure are varied, in many cases substantial and significant retrieval quality improvements can be realized by leveraging this collection structure.
منابع مشابه
Collaborative Information Access: A Conversational Search Approach
Knowledge and user generated content is proliferating on the web in scientific publications, information portals and online social media. This knowledge explosion has continued to outpace technological innovation in efficient information access technologies. In this paper, we describe the methods and technologies for ‘Conversational Search’ as an innovative solution to facilitate easier informa...
متن کاملSearch in Conversational Social Media Collections
Community generated content has become increasingly important over the past several years: blogs, Wikipedia, online forums, twitter, Yahoo! Answers, Facebook and many other online communities that foster social interaction have flourished. However, studying “Search in Social Media” as a distinct sub-field of information retrieval poses some questions. Although there is a loose consensus of the ...
متن کاملIntellectual Structure of Knowledge in Information Behavior: A Co-Word Analysis
Background and Aim: The intellectual structure of knowledge and its research front can be identified by co-word analysis. This research attempts to reveal the intellectual structure of knowledge in information behavior inquiries, via co-word, network analysis, and science visualization tools. Methods: Bibliometric methodology and social network analysis are used. Population comprises 2146 recor...
متن کاملInvited Talk: Lessons from the MALACH Project: Applying New Technologies to Improve Intellectual Access to Large Oral History Collections
In this talk I will describe the goals of the MALACH project (Multilingual Access to Large Spoken Archives) and our research results. I'll begin by describing the unique characteristics of the oral history collection that we used, in which Holocaust survivors, witnesses and rescuers were interviewed in several languages. Each interview has been digitized and extensively catalogued by subject ma...
متن کاملDescribing and Selecting Collections of Georeferenced Media Items in Peer-to-Peer Information Retrieval Systems
The ever-increasing amount of media items on the World Wide Web and on private devices leads to a strong need for adequate indexing and search techniques. Trends such as personal media archives, social networks, mobile devices with huge storage space, and networks with high bandwidth capacities make distributed solutions and in particular Peer-to-Peer (P2P) Information Retrieval (IR) systems at...
متن کامل