To Construct Search Engine Analyzer for Electrical Enterprises Based on Lucene
نویسندگان
چکیده
There are many professional vocabularies in electrical enterprises, and existing analyzer could not fulfill the application when constructing the search engine for electrical enterprises. In this article, we take the operation system of electrical enterprises as the background, and put forward a sort of word segmentation algorithm based on the implementation of vocabulary in order to design the analyzer of search engine which could be applied in electrical enterprises. The analyzer is completed based on the electrical professional dictionary and could solve many unsatisfactory problems of existing analyzer. At the same time, we adopt the method constructing the word tree, and when loading the vocabulary, first construct a words and expressions tree in the memory, and corresponding word could be segmented only by traversing the tree when segmenting word, which could solve the limitation that one maximum word length must be enacted in usual maximum matching algorithm, and largely enhance the efficiency of word segmentation and avoid meaningless matching algorithm. Finally, we compare the analyzer with two interior analyzers in Lucene, and the result indicated that the analyzer was better than the internal analyzer in Lucene whether for time and the efficiency of word segmentation for the application system of electrical enterprise, which proved that the analyzer could fulfill the requirement to construct the search engine for electrical enterprises.
منابع مشابه
Application of Full Text Search Engine Based on Lucene
This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retrieval’s response time, the experimental results show that the full text search of Lucene has faster retrieval speed.
متن کاملThe Study on Lucene Based IETM Information Retrieval
With the intensive and large scale application of IETM in equipment integrated support, information retrieval technology becomes one of the most key technologies. This article discusses the full-text search technology and Lucene full-text retrieval engine, and combines them to develop a highperformance scalable IETM full-text retrieval system, this system can effectively deal with IETM unstruct...
متن کاملLucene and Juru at TREC 2007: 1-Million Queries Track
Lucene is an increasingly popular open source search library. However, our experiments of search quality for TREC data and evaluations for out-of-the-box Lucene indicated inferior quality comparing to other systems participating in TREC. In this work we investigate the differences in measured search quality between Lucene and Juru, our home-brewed search engine, and show how Lucene scoring can ...
متن کاملN -Gram vs. Keyword-Based Passage Retrieval for Question Answering
In this paper we describe the participation of the Universidad Politécnica of Valencia to the 2006 edition, which was focused on the comparison between a Passage Retrieval engine (JIRS) specifically aimed to the Question Answering task and a standard, general use search engine such as Lucene. JIRS is based on n-grams, Lucene on keywords. We participated in three monolingual tasks: Spanish, Ital...
متن کاملApache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
For the past few years, we used Apache Lucene as recommendation framework in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer and Information Science
دوره 2 شماره
صفحات -
تاریخ انتشار 2009