Sanskrit Linguistics Web Services

نویسندگان

  • Gérard P. Huet
  • Amba P. Kulkarni
چکیده

We propose to demonstrate a collection of tools for Sanskrit Computational Linguistics developed by cooperating teams in the general setting of Web services. These services offer a systematic architecture integrating multilingual lexicons, morphological generation and analysis, segmentation and parsing, and interlink with the Sanskrit Library digital repository. They may be used as distributed Internet services, or installed as local tools on individual users workstations. 1 Community building Sanskrit is the primary culture-bearing language of India, with a continuous production of literature in all fields of human endeavour over the course of four millennia. It benefited from a strong linguistics tradition, established from early times, and notably from the grammar composed by Pān. ini around the fourth century B.C.E., and commented since by innumerable grammatical treatises. This fairly complete descriptive apparatus took a prescriptive character, resulting in a constrained evolution of the language within its official grammar, leading to its stability as a semi-formal language. On the other hand, multiple styles of writing treatises, commentaries, and even poetry, led to a variety of specific dialects, both in prose and in versified form. The efforts towards developing tools for the computational treatment of Sanskrit have been steadily progressing both at national as well as international level. A Sanskrit Computational Linguistics consortium funded by the Indian Government coordinates the development of consistent tools within 7 research institutes. In 2007, the first of a series of International Sanskrit Computational Linguistics Symposia was organized in Paris with the aim of gathering a community of teams sharing ideas as well as linguistic resources, and developing inter-operable software. These symposia have benefited the computer scientists from the grammatical expertise of the traditional scholars, while the traditional scholars could see the practical applications of the thousand of years old theories. Within this general effort, specific tools were developed at Inria in Paris and University of Hyderabad for the analysis of Sanskrit texts, designed as inter-communicating Web services. A specific human-machine interface was developed, allowing annotation experts to produce tagged tree banks for the Sanskrit Library, a digital TEI-conformant repository of Sanskrit corpus. This joint work was presented at COLING-2012 (Goyal et al., 2012). We herein propose to demonstrate the current functionalities of this software platform. 2 Architecture of components It was deemed counter-productive to attempt to build a monolithical rigid system, and we turned rather to developing on various sites independent components, communicating with each This work is licenced under a Creative Commons Attribution 4.0 International License. Page numbers and proceedings footer are added by the organizers. License details: http://creativecommons.org/licenses/by/4. 0/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Collaborative Platform for Sanskrit Processing

Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segm...

متن کامل

A Distributed Platform for Sanskrit Processing

Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segm...

متن کامل

Structure and Implementation of a Digital

The As.t.ādhyāȳı, Pān. ini’s grammar of Sanskrit, exhibits an unparalleled structure which to this day has not been fully understood. It encodes the grammar rules in a very concise manner, making use of inheritance and a sophisticated metalanguage. Modern linguistics could benefit from a deep study of its precise description methods. Unfortunately, due to the fact that they have little to no kn...

متن کامل

a-headers from the As.t.ādhyāyı̄ in Sanskrit literature from the perspective of corpus linguistics

The paper presents strategies for evaluating the influence of Pān. ini’s As.t.ādhyāyı̄ on the vocabulary of Sanskrit. Using a corpus linguistic approach, it examines how the Pān. inian sample words are distributed over post-Pān. inian Sanskrit, and if we can determine any lexicographic influence of the As.t.ādhyāyı̄ on later Sanskrit. The primary focus of the paper lies on data exploration, becau...

متن کامل

Extending the core functionalities of Aṣṭādhyāyī 2.0

The paper describes new layers of linguistic annotation and explorative tools that were added to the project ‘Aṣṭādhyāyī 2.0’. These additions make it possible to execute complex research queries in the digital version of Pāṇini’s grammar with minimal knowledge both of Sanskrit and database query languages. In the project ‘Aṣṭādhyāyī 2.0’, we have developed a digital edition of Pāṇini’s grammar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014