Invited Talk: Domain-adaptation of Natural Language Processing Tools for RE
نویسنده
چکیده
Natural language processing tools like part-of-speech taggers and parsers are being used in a variety of applications involving natural language, including RE. Such tools, based on statistical models of language, are learnt via supervised machine learning algorithms from human-annotated data. Due to their dependence on annotated data, which is limited in size and genre, these models have a fall in performance for words or constructions not encountered in the annotated data, as well as for genres or domains of language different from the supervised training data. This talk will present Tejaswini Deoskar’s work on semi-supervised learning, where a model initially trained on supervised data is further improved by using unannotated data, available in much larger quantities. Such semi-supervised training improves performance over low-frequency words and constructions, i.e. those in the long tail of language use, and may also be used to adapt supervised NLP models to perform better over new domains of text such as those used in RE documents.
منابع مشابه
EFL Classroom Discourse in Iranian Context: Investigating Teacher Talk Adaptation to Students’ Proficiency Level
How language teachers talk is a key factor in organizing and facilitating learning specifically in language classrooms where the medium of instruction is also the subject matter. This study aimed to examine the extent and ways of teacher talk adaptation to students’ proficiency levels in the Iranian EFL context. Two EFL teachers who were teaching three different proficiency levels were observed...
متن کاملKeynote: Evaluation of NLP Tools for Hairy RE Tasks
Natural language processing (NLP) has been used since the 1980s to construct tools for performing natural language (NL) requirements engineering (RE) tasks. While these NL RE tasks are not inherently difficult for humans, on the scale of the collection of NL artifacts for the development of a typical large-scale computer-based system (CBS), these tasks become unmanageable, i.e., hairy. Because ...
متن کاملProceedings of the 10 th European Workshop on Natural Language Generation ( ENLG - 05 )
Probabilistic finite-state methods have been very successful for natural language processing (NLP) problems like tagging, entity identification, and transliteration. These methods have also been packaged in very useful software toolkits. However, they are not so good for attacking problems with large-scale reordering (translation, generation, paraphrasing, question answering, etc.) and sensitiv...
متن کاملAnnotation Adaptation and Language Adaptation in NLP
Adaptation technologies are always useful in NLP when there is discrepancy between the training scenario and use scenario. They are also effective in alleviating the data scarcity problem. Domain adaptation is the most popular kind of adaptation technologies and is intensively researched. In this talk we will introduce two other kinds of adaptation technologies: annotation adaptation and langua...
متن کاملCross-Domain and Cross-Language Porting of Shallow Parsing
English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018