Kosmix: High-Performance Topic Exploration using the Deep Web
نویسنده
چکیده
Kosmix lies at the intersection of two important trends: topic exploration and the Deep Web. Topic exploration is a new approach to information discovery on the web that satisfies certain use cases not served well by conventional web search. The Deep Web, an inhospitable region for web crawlers, is emerging as a significant information resource. We describe the anatomy of Kosmix, the first general-purpose topic exploration engine to harness the Deep Web using a federated search approach. We focus in particular on the Kosmix approach to query tranformation and caching, which is essential to ensure reasonable performance.
منابع مشابه
Kosmix: Exploring the Deep Web using Taxonomies and Categorization
We introduce topic exploration, a new approach to information discovery on the web that differs significantly from conventional web search. We then explain why the Deep Web, an inhospitable region for web crawlers, is emerging as a significant information resource. Finally, we describe the anatomy of Kosmix, the first general-purpose topic exploration engine to harness the Deep Web. The Kosmix ...
متن کاملSocial Media Analytics: The Kosmix Story
Kosmix was a Silicon Valley startup founded in 2005 by Anand Rajaraman and Venky Harinarayan. Initially targeting Deep Web search, in early 2010 Kosmix shifted its main focus to social media, and built a large infrastructure to perform social media analytics, for a variety of real-world applications. In 2011 Kosmix was acquired by Walmart and converted into @WalmartLabs, the advanced research a...
متن کاملFrom Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation
Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It typically starts from a useror communityspecific tree of topics along with a few training documents for each tree node, and then crawls the Web with focus on these topics of interest. This process can efficiently build a theme-specific, hierarchical directory whose nodes are populate...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملCRFA-CRBM: a hybrid technique for anomaly recognition in regional geochemical exploration; case study: Dehsalm area, east of Iran
Identification of geochemical anomalies is a significant step during regional geochemical exploration. In this matter, new techniques have been developed based on deep learning networks. These simple-structure-networks act like our brains on processing the data by simulating deep layers of thinking. In this paper, a hybrid compositional-deep learning technique was applied to identify the anomal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009