On Generalized-Topic-Based Chinese Discourse Structure
نویسندگان
چکیده
Song Rou Jiang Yuru Wang Jingyi Beijing Language and Culture University Beijing University of Polytechnic Technology Beijing Forest University Beijing University of Information Science and technology Abstract: Due to the lack of external formal marks, components in Chinese discourse can hardly be categorized into the traditional syntactic system. In fact, Chinese is a typical topic-prominent language, so it should rather be analyzed from the point of topic. This paper, targeting at computer processing, raises the concepts of punctuation clause, generalized topic, discourse structure and topic clause, and reveals the properties of Chinese discourse structure based on generalized topic. The applicability of this theory has been validated in an initial experiment.
منابع مشابه
Building a Chinese discourse topic corpus with a micro-topic scheme based on theme-rheme theory
*Correspondence: [email protected] 2School of Computer Science and Technology, Soochow University, ShiZi Road, Suzhou, China Full list of author information is available at the end of the article Abstract Background: How to build a suitable discourse topic structure is an important issue in discourse topic analysis, which is the core of natural language understanding. Not only is it the key ba...
متن کاملTopic Identification in Chinese Discourse Based on Centering Model
In this article we are concerned with identifying topics of utterances in texts, which are discourse elements reflecting the links between a sentence and its context. The information carried by the topics can be used to contribute to a number of natural language processing applications, such as information retrieval, text categorization and discourse segmentation etc. However, the phenomenon of...
متن کاملSegmentation of Chinese Discourse in Content-Based Information Retrieval
In this paper, we present a novel approach in automatic discourse segmentation without a full semantic understanding. In order to analyse the textual bonds and determine the degree of coherence that a discourse may exhibit, we first represent the tremendous diversity of textual relations into a discourse network. A set of mutual linguistic constraints that largely determines the similarity of m...
متن کاملPDTB-style Discourse Annotation of Chinese Text
We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken toge...
متن کاملContent Modeling Using Latent Permutations
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that...
متن کامل