Stress assignment in Spanish proper names
نویسندگان
چکیده
In this paper, we propose an approach for Stress Assignment in Spanish Proper Names, based on a Multi-Layer Perceptron (MLP). When assigning stress to a word, we first analyse each vowel in the word and then calculate a Stress-Confidence Measure for it, using a MLP. The system will assign the stress to the vowel with the highest stress-confidence measure. In this paper we present and analyse different alternatives for the inputs to the Multi-Layer Perceptron. In all cases, we consider the number of vowels in the name and the vowel position in the word (taking into account only the vowels in the analysed word). For the rest of inputs, we consider a window of letters. These letters are obtained from the context of the vowel considered and from the word ending, in a similar way to [1]. We propose a Discrimination Measure to analyse the discrimination power for the different input configurations and we validate this measure and present the results obtained in each case. For the best configuration we obtain a 94.9% proper names correctly stressed (5.1% error rate). These results are compared to similar experiments using a Memory based learning approach (kNearest Neighbours).
منابع مشابه
Recognition of Named Entities in Spanish Texts
Proper name recognition is a subtask of Name Entity Recognition in Message Understanding Conference. For our corpus annotation proper name recognition is a crucial task since proper names appear approximately in more than 50% of total sentences of the electronic texts that we collected for such purpose. Our work is focused on composite proper names (names with coordinated constituents, names wi...
متن کاملSex, Syntax, and Semantics
Many languages have a grammatical gender system whereby all nouns are assigned a gender (most commonly feminine, masculine, or neuter). Two studies examined whether (1) the assignment of genders to nouns is truly arbitrary (as has been claimed), and (2) whether the grammatical genders assigned to nouns have semantic consequences. In the first study, English speakers’ intuitions about the gender...
متن کاملLinguistic-prosodic processing for text-to-speech synthesis in italian
The linguistic-prosodic processing applied to text-to-speech synthesis in Italian is described. It proceeds in 5 steps: tokenisation and normalisation of abbreviations, numbers, etc.; part-of-speech tagging, based on function words, terminations and contextual heuristics; shallow parsing, based on a chunk grammar; grapheme-to-phoneme conversion, lexical stress assignment and syllabification by ...
متن کاملIdentification of Composite Named Entities in a Spanish Textual Database
Named entities (NE) mentioned in textual databases constitute an important part of their semantics. Lists of those NE are an important knowledge source for diverse tasks. We present a method for NE identification focused on composite proper names (names with coordinated constituents and names with several prepositional phrases.) We describe a method based on heterogeneous knowledge and simple r...
متن کاملWeb-Based Sources for an Annotated Corpus Building and Composite Proper Name Identification
Nowadays, collections of texts with annotations on several levels are useful resources. Huge efforts are required to develop this resource for languages like Spanish. In this work, we present the initial step, lexical level annotation, for the compilation of an annotated Mexican corpus using Web-based sources. We also describe a method based on heterogeneous knowledge and simple Web-based sourc...
متن کامل