Compiling French-Japanese Terminologies from the Web
نویسندگان
چکیده
We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compilation exploit parallel texts, while the more recent ones have focused on comparable corpora. We use bilingual corpora collected from the web and tailor made for the seed terms. For each language, we extract from the corpus a set of MWTs pertaining to the seed’s semantic domain, and use a compositional method to align MWTs from both sets. We increase the coverage of our system by using thesauri and by applying a bootstrap method. Experimental results show high precision and indicate promising prospects for future developments.
منابع مشابه
Improving information retrieval with multiple health terminologies in a quality-controlled gateway
BACKGROUND The Catalog and Index of French-language Health Internet resources (CISMeF) is a quality-controlled health gateway, primarily for Web resources in French (n=89,751). Recently, we achieved a major improvement in the structure of the catalogue by setting-up multiple terminologies, based on twelve health terminologies available in French, to overcome the potential weakness of the MeSH t...
متن کاملMultiple Terminologies in a Health Portal: Automatic Indexing and Information Retrieval
Background: In the speci c context of developing qualitycontrolled health gateways, several standards must be respected (e.g. Dublin Core for metadata element set; thesaurus MeSH as the controlled vocabulary to index Internet resources; HON code to accredit quality of health Web sites). These standards were applied to create the CISMeF Web site (French acronym for Catalog & Index of Health Inte...
متن کاملTerminology-driven Augmentation of Bilingual Terminologies
This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...
متن کاملMuEVo, a Breast Cancer Consumer Health Vocabulary Built Out of Web Forums
Semantically analyze patient-generated text from a biomedical perspective is challenging because of the vocabulary gap between patients and health professionals. The medical expertise and vocabulary is well formalized in standards terminologies and ontologies, which enable semantic analysis of expertgenerated text; however resources which formalize the vocabulary of health consumers (patients a...
متن کاملBIOTEX: A system for Biomedical Terminology Extraction, Ranking, and Validation
Term extraction is an essential task in domain knowledge acquisition. Although hundreds of terminologies and ontologies exist in the biomedical domain, the language evolves faster than our ability to formalize and catalog it. We may be interested in the terms and words explicitly used in our corpus in order to index or mine this corpus or just to enrich currently available terminologies and ont...
متن کامل