Massive Parallelism on the Hybrid Text-Retrieval Machine
نویسنده
چکیده
The design of a high-performance, cost-eeective, machine for retrieving textual data is discussed in this paper. High performance and cost eeectiveness are achieved by a combination of low-cost hard disks, software ltering techniques, and a large amount of main memory. The discussion focuses on the signature processor, which is based on the partitioned signature le technique, and the mass storage system, which is based on a disk array. A performance evaluation on the individual system components, namely, the signature processor and the mass storage system, as well as the entire system is presented.
منابع مشابه
AAAI 1993 Spring Symposium Series Reports
Artificial Intelligence (AAAI) held its 1993 Spring Symposium Series on March 23–25 at Stanford University. This article contains summaries of the eight symposia that were conducted: AI and Creativity, AI and NP-Hard Problems, Building Lexicons for Machine Translation, CaseBased Reasoning and Information Retrieval, Foundations of Automatic Planning, Innovative Applications of Massive Parallelis...
متن کاملA Hybrid Approach for Machine Translation Based on Cross- language Information Retrieval
This paper presents a hybrid approach for Machine Translation (MT) based on Cross-language Information Retrieval (CLIR). This approach uses linguistic and statistical processing and does not need parallel corpora as linguistic resources. A first experimental evaluation of this approach has been done on the CESTA corpus and the obtained results seem good and encouraging. The next step is the TAL...
متن کاملA New Hybrid Machine Translation Approach Using Cross-Language Information Retrieval and Only Target Text Corpora
Parallel corpora play a vital role in Statistical Machine Translation. Nonavailability of these corpora is a major barrier for adding new languages pairs. In this paper, we propose a new hybrid approach for English-French machine translation combining a cross-language search engine and a statistical language model trained from a monolingual corpus. The cross-language search engine returns the t...
متن کاملتعیین مرز و نوع عبارات نحوی در متون فارسی
Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Manage.
دوره 31 شماره
صفحات -
تاریخ انتشار 1995