Chunk and Clause Identification for Basque by Filtering and Ranking with Perceptrons

نویسندگان

  • Iñaki Alegria
  • Bertol Arrieta
  • Xavier Careras
  • Arantza Díaz de Ilarraza
  • Larraitz Uria
چکیده

This paper presents systems for syntactic chunking and clause identification for Basque, combining rule-based grammars with machine-learning techniques. Precisely, we used Filtering-Ranking with Perceptrons (Carreras, Màrquez and Castro, 2005): a learning model that recognizes partial syntactic structures in sentences, obtaining state-of-the-art performance for these tasks in English. This model allows incorporating a rich set of features to represent syntactic phrases, making possible to use information from different sources. We used this property in order to include more linguistic features in the learning model and the results obtained in chunking have been improved greatly. This way, we have made up for the relatively small training data available for Basque to learn a chunking model. In the case of clause identification, our preliminary results are low, which suggest that this is due to the free order of Basque and to the small corpus available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Learning via Global Feedback for Phrase Recognition

We present a system to recognize phrases based on perceptrons, and a global online learning algorithm to train them together. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which discriminatively builds the optimal phrase structure. We provide a recognition-based feedback...

متن کامل

Basque Functional Heads

Bill Haddican NYU 6/18/2001 0. Introduction This paper makes three claims about Basque grammar. First, it argues that Cinque’s (1999) hierarchy of functional heads largely holds for Basque. In a typical pattern, lower morphemes in the hierarchy appear in the reverse order, while higher morphemes appear in Cinque’s order. This is explained through roll-up—iterative XP movement through specifier ...

متن کامل

MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity

We present MultiGranCNN, a general deep learning architecture for matching text chunks. MultiGranCNN supports multigranular comparability of representations: shorter sequences in one chunk can be directly compared to longer sequences in the other chunk. MultiGranCNN also contains a flexible and modularized match feature component that is easily adaptable to different types of chunk matching. We...

متن کامل

Identification and ranking risks of horizontal directional drilling for oil & gas wells by using fuzzy analytic network process, a case study for Gachsaran oil field wells

Risk ranking of Horizontal Directional Drilling (HDD) for gas and oil wells is a key criterion in the project feasibility, pricing and for introducing a risk management strategy that aims to reduce the number of failures in the installation phase and its negative consequences. HDD is currently widely used in drilling wells in Iran, but research in the area of identification and risks ranking of...

متن کامل

Phrase recognition by filtering and ranking with perceptrons

We present a phrase recognition system based on perceptrons, and an online learning algorithm to train them together. The recognition strategy applies learning in two layers, first at word level, to filter words and form phrase candidates, second at phrase level, to rank phrases and select the optimal ones. We provide a global feedback rule which reflects the dependencies among perceptrons and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Procesamiento del Lenguaje Natural

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2008