Linguistically Annotated Learner Corpora: Aspects of a Layered Linguistic Encoding and Standardized Representation
نویسندگان
چکیده
Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the linguistic classes and phenomena realized in the corpus (cf., e.g., Meurers 2005).
منابع مشابه
Encoding Linguistic Corpora
This paper describes the motivation and design of the Corpus Encoding Standard (CES) (Ide, et al., (1996); Ide, 1998), an encoding standard for linguistic corpora intended to meet the need for the development of standardized encoding practices for linguistic corpora. The CES identifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive repre...
متن کاملCorpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora
The Corpus Encoding Standard (CES) is an application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language), conformant to the TEI Guidelines for Electronic Text Encoding and Interchange (Sperberg-McQueen and Burnard, 1994). It provides encoding conventions for linguistic corpora designed to be optimally suited for use in language engineer...
متن کاملSusTEInability of linguistic resources through feature structures
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation co...
متن کاملEULIA: a graphical web interface for creating, browsing and editing linguistically annotated corpora
In this paper we present EULIA, a tool which has been designed for dealing with the linguistic annotated corpora generated by a set of different linguistic processing tools. The objective of EULIA is to provide a flexible and extensible environment for creating, consulting, visualizing, and modifying documents generated by existing linguistic tools. The documents used as input and output of the...
متن کاملCombining Part of Speech Induction and Morphological Induction
Linguistic information is useful in natural language processing, information retrieval and a multitude of sub-tasks involving language analysis. Two types of linguistic information in all languages are part of speech and morphology. Part of speech information reflects syntactic structure and can assist in tasks such as speech recognition, machine translation and word sense disambiguation. Morph...
متن کامل