Semi-automatic Term Extraction with Simplified Term-clips Method: Development and Applications of a JavaScript Component

نویسنده

  • Hsieh-chang Tu
چکیده

Humanists often have their own digitalized texts. Sometimes they want to extract as many terms of specific type as possible from the texts. Term extraction methods are computational algorithms to extract meaningful terms from a large corpus of digitized texts. Term-clips method is a semi-automatic term extraction algorithm that highly demands human-computer interaction to get comfortable extraction results. The algorithm of term clips was developed years ago, and it was shown to be powerful in extracting categorical terms from Chinese ancient novels. However, perhaps due to the complicated original algorithm, or maybe due to the lack of user-friendly interface, this method does not used widely by humanists. In this paper, we shall propose a simplified term-clips method. We emphasize the importance of human interaction to select effective clips and desirable terms. We adopt this algorithm to implement a JavaScript component, and use this component to develop a real system that allows people to extract meaningful terms from their own texts. Encapsulating algorithms into components is helpful to the development of more complicated systems, since system developers do not need to implement the algorithms themselves. Developers can simply include the components and reuse the functions provided by the components. This term-clip component has been used by MARKUS, an open source system developed for taking markups on Chinese ancient texts. In addition, we shall discuss considerations in the design and implementation of the JavaScript component. We show how to use this component to make a simple application, and develop a complete system that helps people extract meaningful terms from their own texts. We illustrate how to use the system by the extractions of personal names and clothing terms from the Chinese ancient novel Dream of the Red Chamber.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تعیین پاسخ سکوهای نیمه مغروق در برابر امواج خطی به روش تقریبی

Development of a simple fast-solving method based on the popular Morrison approach for prediction of semi-submersibles' motion response is the main purpose of the current research. The oblique seas' hydrodynamic specifications are modeled using small amplitude wave theory. The velocity forces and moments have been shown to be small and are therefore neglected. The damping coefficients are obtai...

متن کامل

Long Term Simulation of Shazand Plain Aquifer under Changing Resources and Applications

Iran is among the world’s arid and semi-arid regions and its demand for water has been increased due topopulation growth, urbanization and the development the economic sectors (industrial and agricultural). Inmany of these regions, the lack of planning and the unauthorized use of resources have led to excessiveexploitation and a lower level of groundwater. The usable groundwater for the country...

متن کامل

Semi-automatic Domain Ontology Construction from Spoken Corpus in Tunisian Dialect: Railway Request Information

In this paper, we present a hybrid method for semi-automatic building of domain ontology from spoken dialogue corpus in Tunisian Dialect for the railway request information domain. The proposed method is based on a statistical method for term and concept extraction and a linguistic method for semantic relation extraction. This method consists of three fundamental phases, namely the corpus const...

متن کامل

A SIMPLIFIED LAGRANGIAN MULTIPLIER APPROACH FOR FIXED HEAD SHORT-TERM HYDROTHERMAL SCHEDULING

This paper presents a simplifiedlagrangian multiplier based algorithm to solve the fixed head hydrothermalscheduling problem. In fixed head hydrothermal scheduling problem, waterdischarge rate is modeled as quadratic function of hydropower generation andfuel cost is modeled as quadratic function of thermal power generation. Thepower output of each hydro unit varies with the rate of water discha...

متن کامل

A Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition

In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015