Implementing Urdu Grammar as Open Source Software

نویسندگان

  • Muhammad Humayoun
  • Harald Hammarström
  • Aarne Ranta
چکیده

Urdu is a challenging language because of, first, its Perso-Arabic script, second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia and third, its pragmatically neutral constituent order (SOV Subject Object Verb). Today, the state of art technology to write grammars (morphology + syntax) is to use specialpurpose languages based on finite-state technology. These languages are mostly based on regular expressions. In our opinion, these languages are still close to the machine code. Therefore, we emphasis on using a higher level language to capture the linguistic abstraction. Then that higher level code should be translated into finite state code by some tool if required.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Computational Classification of Urdu Dynamic Copula Verb

In this paper, a lexical functional grammar for an automatic classification of Urdu copula verb hO (be/become) is presented according to linguistic theories. A test suite of sentences containing almost all different conjugation forms of copula verb is extracted from a raw corpus. It is tried to keep only the cases of copular construction because the copula verb hO is very much dynamic in nature...

متن کامل

An Open Source Urdu Resource Grammar

We develop a grammar for Urdu in Grammatical Framework (GF). GF is a programming language for defining multilingual grammar applications. GF resource grammar library currently supports 16 languages. These grammars follow an Interlingua approach and consist of morphology and syntax modules that cover a wide range of features of a language. In this paper we explore different syntactic features of...

متن کامل

Urdu Summary Corpus

Language resources, such as corpora, are important for various natural language processing tasks. Urdu has millions of speakers around the world but it is under-resourced in terms of standard evaluation resources. This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu...

متن کامل

Using An Open-Source Unification-Based System For CL/NLP Teaching

We demonstrate the open-source LKB system which has been used to teach the fundamentals of constraint-based grammar development to several groups of students. 1 Overview of the LKB system The LKB system is a grammar development environment that is distributed as part of the open source LinGO tools (http://wwwcsli.stanford.edu/ ̃aac/lkb.html and http://lingo.stanford.edu, see also Copestake and F...

متن کامل

Development of an Open Source Urdu Screen Reader for Visually Impaired People

Speech technology has enabled computer accessibility for users with visual impairments but the language barrier poses a great challenge. This project is an effort to overcome the hurdles faced by visually impaired people, in terms of language barrier, by providing them access to digital information through software which can communicate with them in Urdu. A survey was conducted in schools for b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007