mining lexicon

Large lexicon construction for TTS system

2002

Ben-Feng CHEN Guo-Ping HU Ren-Hua WANG

Lexicon is an essential part of Chinese Information Processing. In particular, compared with the basic lexicon, a large and perfect lexicon can effectively reduce the complexity and improve the precision of text parsing in TTS System. However, this special lexicon is hard to be constructed by either handwork or computer. This paper presents an approach to construct a large lexicon combining com...

متن کامل

BioLemmatizer: a lemmatization tool for morphological processing of biomedical text

2012

Haibin Liu Tom Christiansen William A. Baumgartner Karin M. Verspoor

BACKGROUND The wide variety of morphological variants of domain-specific technical terms contributes to the complexity of performing natural language processing of the scientific literature related to molecular biology. For morphological analysis of these texts, lemmatization has been actively applied in the recent biomedical research. RESULTS In this work, we developed a domain-specific lemm...

متن کامل

Mental Representation of Cognates/Noncognates in Persian-Speaking EFL Learners

Journal: Journal of Teaching Language Skills 1391

Ferdos Taleb, Zahra Fotovatnia

The purpose of this study was to investigate the mental representation of cognate and noncognate translation pairs in languages with different scripts to test the prediction of dual lexicon model (Gollan, Forster, & Frost, 1997). Two groups of Persian-speaking English language learners were tested on cognate and noncognate translation pairs in Persian-English and English-Persian directions with...

متن کامل

A toolkit for detecting fallacious calls for papers from potential predatory journals

Journal: :Advanced Pharmaceutical Bulletin 2023

Purpose: Flattering emails are crucial in tempting authors to submit papers predatory journals. Although there is ample literature regarding the questionable practices of journals, nature and detection spam need more attention. Current research provides insight into fallacious calls for from potential journals develops a toolkit this regard. Methods: In study, we analyzed three datasets legitim...

متن کامل

Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon

2009

Seung-Hoon Na Yeha Lee Sang-Hyob Nam Jong-Hyeok Lee

Lexicon-based approaches have been widely used for opinion retrieval due to their simplicity. However, no previous work has focused on the domain-dependency problem in opinion lexicon construction. This paper proposes simple feedback-style learning for query-specific opinion lexicon using the set of top-retrieved documents in response to a query. The proposed learning starts from the initial do...

متن کامل

Developing Chinese TAK for Computer Directly

2002

Guo-Ping HU Ben-Feng CHEN Ren-Hua WANG

With the development of text analysis, the quality of the computer-used knowledge is more and more crucial to the analysis accuracy, and the text analysis knowledge (TAK) has also developed by many researchers. But so far, except the lexicon, TAK for computer (such as phrase structure grammar, unregistered word recognition rule, etc) is done on a small scale. Although large scale corpus with wo...

متن کامل

Opinion Mining for Biomedical Text Data: Feature Space Design and Feature Selection

2010

Rajesh Swaminathan Abhishek Sharma Hui Yang

Unstructured text (e.g., journal articles) remains as the primary means for publishing biomedical research results. To extract and integrate knowledge from such data, text mining has been routinely applied. One important task is extracting relationships between bio-entities such as foods and diseases. Most existing studies however stop short of further analyzing the extracted relationships such...

متن کامل

Sentence Alignment in Parallel, Comparable, and Quasi-comparable Corpora

2006

Percy Cheung Pascale Fung

We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and...

متن کامل

An Affix Removal Stemmer for Natural Language

2014

Abhijit Paul Arindam Dey Bipul Syam Purkayastha

Stemming is the prerequisite step in Text Mining, Spelling Checker applications as well as a basic requirement for Natural Language Processing (NLP) tasks. Also it is very important in most of the Information Retrieval (IR) systems. This paper describes an affix stripping technique for finding out the stems from context free text in Nepali Language using lexical lookup based and rule based appr...

متن کامل

Statistical Machine Translation without Parallel Data

2012

Maryam Siahbani

We examine approaches of statistical machine translation without parallel data (SMT). SMT has achieved impressive performance by leveraging large amounts of parallel data in the source and target languages. But such data is available only for a few language pairs and domains. Using human annotation to create new parallel corpora sufficient for building a good translation system is too expensive...

متن کامل