A Framework for Bridging the Gap Between Open Source Search Tools

نویسندگان

  • Madian Khabsa
  • Stephen Carman
  • Sagnik Ray Choudhury
  • C. Lee Giles
چکیده

Building a search engine that can scale to billions of documents while satisfying the needs of the users presents serious challenges. Few successful stories have been reported so far [36]. Here, we report our experience in building YouSeer, a complete open source search engine tool that includes both an open source crawler and an open source indexer. Our approach takes other open source components that have been proven to scale and combines them to create a comprehensive search engine. YouSeer employs Heritrix as a web crawler, and Apache Lucene/Solr for indexing. We describe the design and architecture, as well as additional components that need to be implemented to build such a search engine. The results of experimenting with our framework in building vertical search engines are competitive when compared against complete open source search engines.. Our approach is not specific to the components we use, but instead it can be used as generic method for integrating search engine components together.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Integrating ROS into Educational Robotics: Bridging the Gap between Grade School and Grad School

Robots and robot competitions have been used throughout the entire STEM (science, technology, engineering, and math) education pipeline [8]. However, a gap has been identified in the focus and reinforcement of educational robotics concepts between K-12 and higher education [3]; for example, K-12 robot competitions tend to focus more on open-loop, reactive, and/or rule-based solutions with limit...

متن کامل

Causes of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It

Causes of the Gap between Junior High School Intended, Implemented, and Attained Curricula and Ways of Bridging It   M.A. Jamaalifar* S. Sh. HaashemiMoghadam, Ph.D.** Z. Aabedi Karajibaan, Ph.D.*** A.R. Faghihi, Ph.D.****   To identify the causes of the perceived gap between junior high school intended, implemented, and attained curricula, a group of 30 curriculum planners, 50 educationa...

متن کامل

Cross border E-Science and Research Partnership: Bridging the Gap Between Science and Media

  E-Science is a tool that helps scientists to store, interpret, analyze and make a network of their data, and it can play a critical role in different aspects of the scientific goals and research. This commentary, under the topic of Cross Border E-Science and Research Partnership: Bridging the Gap between Science and Media,[1] attempts to shed light on E-Science with emphasis on three importa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012