A model to predict lexical complexity and to grade words (Un modèle pour prédire la complexité lexicale et graduer les mots) [in French]
نویسندگان
چکیده
Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists and language teachers. More recently, this issue has seen a growing interest in the field of Natural Language Processing (NLP) and, in particular, that of automatic text simplification. The aim of this task is to identify words and structures which may be difficult to understand by a target audience and provide automated tools to simplify these contents. This article focuses on the lexical issue by identifying a set of predictors of the lexical complexity whose efficiency are assessed with a correlational analysis. The best of those variables are integrated into a model able to predict the difficulty of words for learners of French. Mots-clés : complexité lexicale, analyse morphologique, mots gradués, ressources lexicales.
منابع مشابه
Studying frequency-based approaches to process lexical simplification (Approches à base de fréquences pour la simplification lexicale) [in French]
RÉSUMÉ La simplification lexicale consiste à remplacer des mots ou des phrases par leur équivalent plus simple. Dans cet article, nous présentons trois modèles de simplification lexicale, fondés sur différents critères qui font qu’un mot est plus simple à lire et à comprendre qu’un autre. Nous avons testé différentes tailles de contextes autour du mot étudié : absence de contexte avec un modèle...
متن کاملLearning Domain-Specific, L1-Specific Measures of Word Readability
Improved readability ratings for second-language readers could have a huge impact in areas such as education, advertising, and information retrieval. We propose ways to adapt readability measures for users who (a) are proficient in a particular domain, and (b) have a particular native language (L1). Specifically, we predict the readability of individual words. Our learned models use a range of ...
متن کاملUsing distributed word representations for robust semantic role labeling (Utilisation de représentations de mots pour l'étiquetage de rôles sémantiques suivant FrameNet) [in French]
Résumé. D’après la sémantique des cadres de Fillmore, les mots prennent leur sens par rapport au contexte événementiel ou situationnel dans lequel ils s’inscrivent. FrameNet, une ressource lexicale pour l’anglais, définit environ 1000 cadres conceptuels couvrant l’essentiel des contextes possibles. Dans un cadre conceptuel, un prédicat appelle des arguments pour remplir les différents rôles sém...
متن کاملActive Data: A Programming Model for Managing Big Data Life Cycle
The Big Data challenge consists in managing, storing, analyzing and visualizing these ever growing huge datasets to extract sense and knowledge. As the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication,...
متن کاملA Model of Vocabulary Partition
The model proposed here is used to describe the vocabulary of a corpus. It is divided into two groups: general vocabulary which is used whatever the circumstances and several local (or 'specialized') vocabularies, each of which is used in only one part of the corpus. General words may appear everywhere in the text and their increase with corpus length can be estimated with Muller's formula. In ...
متن کامل