Kosmix: High-Performance Topic Exploration using the Deep Web

نویسنده

Anand Rajaraman

چکیده

Kosmix lies at the intersection of two important trends: topic exploration and the Deep Web. Topic exploration is a new approach to information discovery on the web that satisfies certain use cases not served well by conventional web search. The Deep Web, an inhospitable region for web crawlers, is emerging as a significant information resource. We describe the anatomy of Kosmix, the first general-purpose topic exploration engine to harness the Deep Web using a federated search approach. We focus in particular on the Kosmix approach to query tranformation and caching, which is essential to ensure reasonable performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kosmix: Exploring the Deep Web using Taxonomies and Categorization

We introduce topic exploration, a new approach to information discovery on the web that differs significantly from conventional web search. We then explain why the Deep Web, an inhospitable region for web crawlers, is emerging as a significant information resource. Finally, we describe the anatomy of Kosmix, the first general-purpose topic exploration engine to harness the Deep Web. The Kosmix ...

متن کامل

Social Media Analytics: The Kosmix Story

Kosmix was a Silicon Valley startup founded in 2005 by Anand Rajaraman and Venky Harinarayan. Initially targeting Deep Web search, in early 2010 Kosmix shifted its main focus to social media, and built a large infrastructure to perform social media analytics, for a variety of real-world applications. In 2011 Kosmix was acquired by Walmart and converted into @WalmartLabs, the advanced research a...

متن کامل

From Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation

Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It typically starts from a useror communityspecific tree of topics along with a few training documents for each tree node, and then crawls the Web with focus on these topics of interest. This process can efficiently build a theme-specific, hierarchical directory whose nodes are populate...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

CRFA-CRBM: a hybrid technique for anomaly recognition in regional geochemical exploration; case study: Dehsalm area, east of Iran

Identification of geochemical anomalies is a significant step during regional geochemical exploration. In this matter, new techniques have been developed based on deep learning networks. These simple-structure-networks act like our brains on processing the data by simulating deep layers of thinking. In this paper, a hybrid compositional-deep learning technique was applied to identify the anomal...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

PVLDB

دوره 2 شماره

صفحات -

تاریخ انتشار 2009

Kosmix: High-Performance Topic Exploration using the Deep Web

نویسنده

چکیده

منابع مشابه

Kosmix: Exploring the Deep Web using Taxonomies and Categorization

Social Media Analytics: The Kosmix Story

From Focused Crawling to Expert Information: an Application Framework for Web Exploration and Portal Generation

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

CRFA-CRBM: a hybrid technique for anomaly recognition in regional geochemical exploration; case study: Dehsalm area, east of Iran

عنوان ژورنال:

اشتراک گذاری