TGermaCorp - A (Digital) Humanities Resource for (Computational) Linguistics

نویسندگان

  • Andy Lücking
  • Armin Hoenen
  • Alexander Mehler
چکیده

TGermaCorp is a German text corpus whose primary sources are collected from German literature texts which date from the sixteenth century to the present. The corpus is intended to represent its target language (German) in syntactic, lexical, stylistic and chronological diversity. For this purpose, it is hand-annotated on several linguistic layers, including POS, lemma, named entities, multiword expressions, clauses, sentences and paragraphs. In order to introduce TGermaCorp in comparison to more homogeneous corpora of contemporary everyday language, quantitative assessments of syntactic and lexical diversity are provided. In this respect, TGermaCorp contributes to establishing characterising features for resource descriptions, which is needed for keeping track of a meaningful comparison of the ever-growing number of natural language resources. The assessments confirm the special role of proper names, whose propagation in text may influence lexical and syntactic diversity measures in rather trivial ways. TGermaCorp will be made available via hucompute.org.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GutenTag: an NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus

This paper introduces a software tool, GutenTag, which is aimed at giving literary researchers direct access to NLP techniques for the analysis of texts in the Project Gutenberg corpus. We discuss several facets of the tool, including the handling of formatting and structure, the use and expansion of metadata which is used to identify relevant subcorpora of interest, and a general tagging frame...

متن کامل

Enhancing Access to Media Collections and Archives Using Computational Linguistic Tools

In this paper, we outline the strategies, methodology, and infrastructure needed to bring advanced computational linguistic tools to researchers and archivists in the humanities. We discuss three use cases involving the application of the Language Application Grid (LAPPS), an open, web-based infrastructure providing interoperable access to hundreds of computational linguistic (CL) component web...

متن کامل

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

In this paper, we present the concept, content and experience with an actively running Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities. This video-based course is held in German, does not require any programming skills, and serves as an introduction to automatic text analysis. The target audience is anyone who is interested in applying basic language tech...

متن کامل

Integration of Linguistic Markup into Semantic Models of Folk Narratives: The Fairy Tale Use Case

Propp’s influential structural analysis of fairy tales created a powerful schema for representing storylines in terms of character functions, which is straightforward to exploit in computational semantic analysis and procedural generation of stories of this genre. We tackle two resources that draw on the Proppian model – one formalizes it as a semantic markup scheme and the other as an ontology...

متن کامل

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanitie

In this paper, we present the concept, content and experience with an actively running Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities. This video-based course is held in German, does not require any programming skills, and serves as an introduction to automatic text analysis. The target audience is anyone who is interested in applying basic language tech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016