Reducing Lexical Ambiguity in Serbo-Croatian
ثبت نشده
چکیده
This paper presents an approach to acquisition of some lexical and grammatical constraints from large corpora using genetic algorithms. The main aim is to use these constraints to automatically define local grammars that can be used to reduce lexical ambiguity usually found in an initially tagged text. A genetic algorithm for computation of the minimal representation of grammatical features of textual constituents is suggested. The algorithm incorporates two types of genes, dominant and recessive, which are specific for the features that are analysed. The resulting genetic structure describes the constraints that have to be fulfilled in order to form a correct utterance. As a case study, the suggested algorithm is applied on contexts of prepositional phrases, and features of corresponding noun phrases are obtained. The results obtained coincide with (theoretical) grammars that define the constraints for such noun phrases.
منابع مشابه
Visual Word Recognition in Serbo-croatian Is Necessarily Phonological
In a naming task conducted with bi-alphabetic readers of Serbo-Croatian. it was shown that letter strings that can be assigned both a Roman and a Cyrillic alphabet reading incur longer latencies than the unique alphabet transcription of the same word. and that the magnitude of the difference depended on the number of ambiguous characters in the ambiguous letter string. While this wi thin-word p...
متن کاملThe contribution of morphology to word recognition.
Evidence of morphological processing was investigated in three word recognition tasks. In the first study, phonological ambiguity of the base morpheme in morphologically complex words of Serbo-Croatian was exploited in order to evaluate the claim that the base morpheme serves as the unit by which entries in the lexicon are accessed. An interaction of base morpheme ambiguity and affix characteri...
متن کاملStrategies for visual word recognition and orthographical depth: a multilingual comparison.
We investigated the psychological reality of the concept of orthographical depth and its influence on visual word recognition by examining naming performance in Hebrew, English, and Serbo-Croatian. We ran three sets of experiments in which we used native speakers and identical experimental methods in each language. Experiment 1 revealed that the lexical status of the stimulus (high-frequency wo...
متن کاملTranscribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaptation
This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive voca...
متن کاملPhonological ambiguity and lexical ambiguity: effects on visual and auditory word recognition.
Three experiments in Serbo-Croatian were conducted on the effects of phonological ambiguity and lexical ambiguity on printed word recognition. Subjects decided rapidly if a printed and a spoken word matched or not. Printed words were either phonologically ambiguous (two possible pronunciations) or unambiguous. If phonologically ambiguous, either both pronunciations were real words or only one w...
متن کامل