gdbank: The beginnings of a corpus of dependency structures and type-logical grammar in Scottish Gaelic

نویسنده

  • Colin Batchelor
چکیده

We present gdbank, a small handbuilt corpus of 32 sentences with dependency structures and categorial grammar type assignments. The sentences have been chosen to illustrate as broad a range of the unusual features of Scottish Gaelic as possible, particularly nouns being used to represent psychological states where more thoroughly-studied languages such as English and French would prefer a verb, and prepositions marking aspect, as is also seen in Welsh and, for example, Irish Gaelic. We provide hand-built dependency trees, building on previous work on Irish Gaelic and using the Universal Dependency Scheme. We also provide a tentative categorial grammar account of the words in the sentences, based largely on previous work on English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Using the Spoken Dutch Corpus for type-logical grammar induction

Abstract The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the derivational structures of current ‘lexicalized’ grammar formalisms. Through such translations, the CGN tree bank can be used to train and evaluate computational grammars within these framewo...

متن کامل

Developing an Automatic Part-of-Speech Tagger for Scottish Gaelic

This paper describes an on-going project that seeks to develop the first automatic PoS tagger for Scottish Gaelic. Adapting the PAROLE tagset for Irish, we manually re-tagged a preexisting 86k token corpus of Scottish Gaelic. A double-verified subset of 13.5k tokens was used to instantiate eight statistical taggers and verify their accuracy, via a randomly assigned hold-out sample. An accuracy ...

متن کامل

Transforming Dependency Structures to Logical Forms for Semantic Parsing

The strongly typed syntax of grammar formalisms such as CCG, TAG, LFG and HPSG offers a synchronous framework for deriving syntactic structures and semantic logical forms. In contrast—partly due to the lack of a strong type system—dependency structures are easy to annotate and have become a widely used form of syntactic analysis for many languages. However, the lack of a type system makes a for...

متن کامل

Microvariation in Celtic Relatives

1. Introduction: The syntactic effects of relativisation in Welsh, Irish and Scottish Gaelic vary in a complex and potentially baffling way between the languages and across their dialects. The crucial components of this variation are resumptivity, " wh-agreeing " complementizers, a differential treatment of locality domains, and variation in the availability of pied-piping. In this paper we arg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014