Categorization And Standardizing Proper Nouns For Efficient Information Retrieval
نویسندگان
چکیده
In this paper, we describe the most recent implementation and evaluation of the proper noun categorization and standardization module of the DRLINK document detection system being developed at Syracuse University, under the auspices of ARPA's TIPSTER program. We also discuss the expansion of group common nouns and group proper nouns to enhance retrieval recall. Successful proper noun boundary identification within the part of speech tagger is essential for successful categorization. The proper noun classification module is designed to assign a category code to each proper noun entity, using 30 ca tegor ies genera ted f rom corpus analysis . Standardization of variant proper nouns occurs at three levels of processing. Expansion of group proper nouns and group common nouns is performed on queries. Standardization and categorization is performed on queries and documents. DR-LINK's overall precision for proper noun categorization was 93%, based on 589 proper nouns occurring in the evaluation set.
منابع مشابه
Interpretation of Proper Nouns for Information Retrieval
In information retrieval, proper nouns in queries frequently serve as the most important key terms for identifying relevant documents in a database. Furthermore, common nouns (e.g. 'developing countries ') or group proper nouns (e.g. 'U.S. government') in queries sometimes need to be expanded to their constituent set of proper nouns in order to serve as useful retrieval terms. We have implement...
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کاملCross-Language Information Retrieval of Proper Nouns using Context Information
Translating news articles frequently involves finding foreign language equivalents for proper nouns occurring for the first time in an original article, a time-consuming and labor-intensive task. We propose an Internet-based technique for efficiently finding foreign language equivalents for proper nouns via Cross-Language Information Retrieval (CLIR). In this technique, the CLIR of proper nouns...
متن کاملA Technique for Proper Feature Selection with Automated Text Categorization in the Vector Space Model
Efficient and effective text categorization and information retrieval techniques are very important and play a major role in managing the ever increasing amount of data and textual information available in digital form. Text categorization has important applications like information retrieval, bad information identification, document and web resource filtering. Before the application of various...
متن کاملUse of Query Concepts and Information Extraction to Improve Information Retrieval Effectiveness
In TREC-7, we participated in both the automatic and manual tracks for category A. For the automatic runs, we included a baseline run and an experimental run that filtered relevance feedback using proper nouns. The baseline run used the short query versions and term thresholding to focus on the most meaningful terms. The experimental run used the long queries (title, description and narrative) ...
متن کامل