Creating an Annotated Tamil Corpus as a Discourse Resource

نویسندگان

  • Ravi Teja Rachakonda
  • Dipti Misra Sharma
چکیده

We describe our efforts to apply the Penn Discourse Treebank guidelines on a Tamil corpus to create an annotated corpus of discourse relations in Tamil. After conducting a preliminary exploratory study on Tamil discourse connectives, we show our observations and results of a pilot experiment that we conducted by annotating a small portion of our corpus. Our ultimate goal is to develop a Tamil Discourse Relation Bank that will be useful as a resource for further research in Tamil discourse. Furthermore, a study of the behavior of discourse connectives in Tamil will also help in furthering the cross-linguistic understanding of discourse connectives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building an Annotated Corpus for Text Summarization and Question Answering

We describe ongoing work in semi-automatic annotating corpus, with the goal to answer “why” question in question answering system and give a construction of the coherent tree for text summarization. In this paper we present annotation schemas for identifying the discourse relations that hold between the parts of text as well as the particular textual of span that are related via the discourse r...

متن کامل

Japanese Dialogue Corpus of Multi-Level Annotation

This paper describes a Japanese dialogue corpus annotated with multi-level information built by the Japanese Discourse Research Initiative, Japanese Society for Artificial Intelligence. The annotation information consists of speech, transcription delimited by slash units, prosodic, part of speech, dialogue acts and dialogue segmentation. In the project, we used the corpus for obtaining new find...

متن کامل

The Penn Discourse TreeBank as a Resource for Natural Language Generation

While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse TreeBank (PDTB) can serve as a valuable large scale annotated...

متن کامل

Creating and Exploiting Multimodal Annotated Corpora: The ToMA Project

The paper presents a project aiming at collecting, annotating and exploiting a dialogue corpus from a multimodal perspective. The goal of the project is the description of the different parameters involved in a natural interaction process. Describing such complex mechanism requires corpora annotated in different domains. This paper first presents the corpus and the scheme used in order to annot...

متن کامل

Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory

We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011