Mining : Origins of Database Tomography and Multi - Word Phrase Clustering

نویسنده

  • Ronald N. Kostoff
چکیده

This report initially describes the motivations for co-word analysis in support of research policy formulation and research implementation evaluation. It places coword analysis in perspective to other co-occurrence techniques such as cocitation and co-nomination analyses. It then traces the origins of co-word analysis in computational linguistics, describes in detail the development of co-word analysis for research evaluation, and concludes by presenting a new approach to co-word analysis for research evaluation (Database Tomography). The report shows that this new approach to co-word analysis, which requires no index or key words but deals with text directly, is a useful tool for scanning large bodies of text. It can identify pervasive thrust areas and their interrelationships, and serve as a starting point for further in-depth analysis of the text. Its value increases as: 1) the size of text increases and 2) the breadth of topical areas covered by the text increases beyond the expertise of a moderate number of panels of experts. A single link clustering example is shown that represents the first use of multi-word technical phrases in modern clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1 Science and Technology Text Mining : near - Earth Space

Database Tomography is a patented system for analyzing large amounts of textual computerized material. It includes algorithms for extracting multi-word phrase frequencies and performing phrase proximity analyses. This report shows how Database Tomography can be used to derive technical intelligence from the published literature. One potential application of Database Tomography is to obtain the ...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

(the Views in This Paper Are Solely Those of the Authors and Do Not Represent the Views of the Department of the Navy, Any of Its

Database Tomography (DT) is a textual database analysis system consisting of two major components: 1) algorithms for extracting multi-word phrase frequencies and phrase proximities (physical closeness of the multi-word technical phrases) from any type of large textual database, to augment 2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence fr...

متن کامل

Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study

Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003