Dependency direction as a means of word-order typology: A method based on dependency treebanks

نویسنده

  • Haitao Liu
چکیده

Word-order typology often uses the linear order of binary grammatical pairs in sentences to classify a language. The present paper proposes a method based on dependency treebanks as a typological means. This paper investigates 20 languages using treebanks with different sizes from 16 K to 1 million dependencies. The results show that some languages are more head-initial or head-final than others, but all contain head-initial and head-final elements. The 20 languages can be arranged on a continuum with complete head-initial and head-final patterns as the two ends. Some data about subject–verb, object–verb and adjective–noun are extracted from the treebanks for comparison with the typological studies based on the traditional means, the results are similar. The investigation demonstrates that the proposedmethod is valid for positioning a language in the typological continuum and the resources from computational linguistics can also be used in language typology. 2009 Elsevier B.V. All rights reserved. E-mail address: [email protected]. 1 According to Lehmann (2005) and Tesnière (1959:32), Schmidt (1926) was the first to use the basic components of the sentence and their interrelationships as a pointer to language typology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

Multi-lingual Dependency Parsing Evaluation: a Large-scale Analysis of Word Order Properties using Artificial Data

The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effect...

متن کامل

Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks

This paper shows how the current Universal Dependency treebanks can be used for clustering structural global linguistic features of the treebanks to reveal a purely structural syntactic typology of languages. Different uniand multi-dimensional data extraction methods are explored and tested in order to assess both the coherence of the underlying syntactic data and the quality of the clustering ...

متن کامل

Querying Diverse Treebanks in a Uniform Way

This paper presents a system for querying treebanks in a uniform way. The system is able to work with both dependency and constituency based treebanks in any language. We demonstrate its abilities on 11 different treebanks. The query language used by the system provides many features not available in other existing systems while still keeping the performance efficient. The paper also describes ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010