Syntactic Sentence Fusion Techniques for Bengali

نویسندگان

  • Amitava Das
  • Sivaji Bandyopadhyay
چکیده

The present paper describes various syntactic sentence fusion techniques for Bengali language that belongs to the Indo-Aryan language family. Firstly a clause identification and classification system marks clause boundaries and classifies them as principle clause and subordinate clauses. A rule-based sentence classification system has been developed to categorize sentences as simple, complex and compound. The final syntactic sentence fusion system makes use of the sentence class and the clause types and finally fuses two textually entailed sentences using verb paradigm information and noun morphological information. The system outputs are compared with a gold standard data set using manual evaluation and BLEU techniques. The evaluation results yield good accuracy scores. The syntactic sentence fusion technique developed in the present work may be applied for other Indian languages. Keywords—Clause Identification and Classification, Sentence Type, Syntactic Sentence Fusion, Evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Emotion Holder from Bengali Blog Texts---An Unsupervised Syntactic Approach

This paper presents two different approaches for identifying emotion holders from Bengali blog sentences. Two types of strategies yield average agreement measures of 0.78 and 0.80 for annotating emotion holders with respect to all emotion classes. The baseline model is developed based on the combinations of various part-of-speech (POS) features extracted from the phrase-based similarities. The ...

متن کامل

Bengali text summarization by sentence extraction

Text summarization is a process to produce an abstract or a summary by selecting significant portion of the information from one or more texts. In an automatic text summarization process, a text is given to the computer and the computer returns a shorter less redundant extract or abstract of the original text(s). Many techniques have been developed for summarizing English text(s). But, a very f...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Topic-Based Bengali Opinion Summarization

In this paper the development of an opinion summarization system that works on Bengali News corpus has been described. The system identifies the sentiment information in each document, aggregates them and represents the summary information in text. The present sys-tem follows a topic-sentiment model for sentiment identification and aggregation. Topic-sentiment model is designed as discourse lev...

متن کامل

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010