Empirical Verification of Meaning-Game-based Generalization of Centering Theory with Large Japanese Corpus

نویسندگان

  • Shun Shiramatsu
  • Kazunori Komatani
  • Takashi Miyata
  • Koichi Hashida
  • Hiroshi G. Okuno
چکیده

osz et al., 1995) tries to explain relations among attention, anaphora, and theoretical limitations. The first is the lack of a principle behind these a. The second is that the salience of discourse entities has not been d, although it plays a critical role in this theory. Hasida et al. (1995, 1996) game as a more principled model of intentional communication based on im that it can derive centering theory. This claim, however, has not yet been of substantial linguistic data. In this paper, we formulate salience as a in terms of a reference probability. We also formulate preferences subsuming er this quantitative formulation of salience. The preferences are derived from d entail more general predictions than those of conventional centering theory. vercome the above limitations of centering theory. By following them, we r generalization with a large Japanese corpus. The experimental results show correlation between the salience (reference probability) of an entity and the a noun phrase which refers to the entity. They also indicate correspondence f expected utility and the ranking of the transition states. These results indicate n is appropriate. itative modeling of discourse is important for analyzing and generating theory (CT) is a model of discourse structures. It explains the relations among and cohesion (Iida, 1997). However, CT has had two theoretical limitations. f a general principle behind the discourse phenomena. Although some studies on analyzing surficial linguistic features without general principles, we ciple of discourse phenomena must be addressed based on measurable d is that “salience”, which plays a critical role in CT, cannot be verified based ta because it is not formulated as a measurable quantity, but as heuristic rules. ed the general principle of CT. We adopted the meaning game (MG) (Hasida et ework because it gives a more principled explanation of the discourse does. MG is a model of intentional communication (e.g., anaphora) based on layers in game theory correspond to interlocutors in MG, and they decide their retations at the Pareto-optimum. Although Hasida et al. (1995) claimed that from the MG by formulating salience in terms of a reference probability, their claim has yet to be verified on the basis of substantial linguistic data. In this paper, we formulate the MG-based generalization of CT and verify it with a large corpus of Japanese newspaper articles. Furthermore, we quantitatively define salience by using multiple regression with a corpus for the MG-based generalization and for its verification. 2. Centering Theory and Its Two Issues 2.1 Centering Theory In CT, a discourse is represented as a sequence of utterances [U1, U2, ... ,Un]. The “center” is a discourse entity which draws attention. The center is likely to be pronominalised. The “salience” represents the degree of attention to a discourse entity. The salience also represents the likelihood of pronominalization. The salience has been defined as a heuristic ranking in previous studies (see Section 2.2). Centers are categorized as follows: Cb(Ui): The backward-looking center of the utterance Ui, which denotes the most salient discourse entity referenced in both the previous context and the current utterance Ui. Cf(Ui): The forward-looking centers of Ui, which denote a list of entities sorted by their salience. Cp(Ui): The preferred center of Ui, which is the most salient discourse entity in Cf(Ui). CT embodies as the following rules (preferences) based on the heuristics definition of salience. Rule 1 (pronominalization): If any element in Cf(Ui) is pronominalized, the Cb(Ui) is also pronominalized. Rule 2 (topic continuity): The transition states of centers between utterances (Table 1) are preferred in the following order: CONTINUE > RETAIN > SMOOTH-SHIFT > ROUGHSHIFT. Table 1: Transition states of centers between utterances ) ( ) ( 1 − = i i U Cb U Cb ) ( ) ( 1 − ≠ i i U Cb U Cb ) ( ) ( i i U Cp U Cb = CONTINUE SMOOTH-SHIFT ) ( ) ( i i U Cp U Cb ≠ RETAIN ROUGH-SHIFT Rule 1 means that pronouns are more likely to refer to Cb than non-pronouns. Rule 2 represents the preference order among transition states according to the strength of topic continuity. 2.2 Two Issues Conventional CT studies face two limitations: 1. Lack of principles behind the rules. CT does not explain why the two rules occur in discourse phenomena. 2. Salience is formalized neither objectively nor quantitatively, but heuristically (e.g., Cf-ranking). Such ranking is non-falsifiable (unscientific) and cannot be verified against real linguistic data. The first limitation means that CT should have a hypothesis about the mechanisms behind discourse phenomena. The second limitation means that CT should be based on the quantitative definition of salience. Salience in CT is approximated by a heuristic ranking, called “Cf-ranking” (Walker et al., 1994), as follows: English Cf-ranking: subject > object > indirect object > complement > adjunct Japanese Cf-ranking: topic (zero or grammatical) > subject > indirect object > object > others Proceedings of PACLIC 19, the 19 Asia-Pacific Conference on Language, Information and Computation. The above Cf-ranking depends on only grammatical function. While Strube et al. (1999) proposed an extended Cf-ranking integrated with information status and Nariyama (2001) proposed an extended ranking integrated with contextual information, these rankings are based on surficial observations without sufficient theoretical grounds. Although Poesio et al. (2004) discussed the parameters settings in CT, their discussion was also based on heuristic ranking. Besides this second limitation, we also note that heuristic ranking is difficult to integrate with other features that influence salience (e.g., distance between the current utterance and the latest expression referring to the target entity). We address the above two issues in the following sections. 3. Generalization of Centering Theory based on the Meaning Game The meaning game (MG) is a hypothesis about a model of intentional communication based on game theory (Hasida, 1996). We adopted MG to give CT a general principle. The MG-based account of anaphora is a more principled hypothesis than that of CT, because MG is based on the general principle of decision-making. In MG, the interlocutors’ expected utility is represented as:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Meaning-Game-based Centering Model with Statistical Definition of Utility of Referential Expression and Its Verification Using Japanese and English Corpora

This paper presents a quantitative modeling of referential coherence by which conversation systems measure the smoothness of discourse. Investigations of the corpora show that referential coherence depends on languages or genres of discourse. Our goal is to establish a quantitative model that can be statistically adapted to various languages. Centering theory explains referential coherence by u...

متن کامل

Decision Theory and Discourse Particles: A Case Study from a Large Japanese Sentiment Corpus

The distribution and use of the Japanese particle yo is examined using a large annotated sentiment corpus. The data is shown to support a decision-theoretic account of yo’s meaning (Davis, 2009). A decision-theoretic approach to the analysis of sentiment corpora is proposed, by which empirical predictions of decision-theoretic formal analyses can be tested using large sets of naturalistic data.

متن کامل

A game theory approach to the sawnwood and pulpwood markets in the north of Iran

Duopoly game theory is applied to the wood industrial markets (sawnwood and pulpwood markets) in the North of Iran. The Nash equilibrium and the dynamic properties of the system based on marginal adjustments are determined. The probability that the Nash equilibrium will be reached is almost zero. The dynamical properties of sawnwood and pulpwood prices derived via the duopoly game model are fou...

متن کامل

A novel cooperative game between client and subcontractors based on technical characteristics

Large projects often have several activities which are performed by some subcontractors with several skills. Costs and time reduction and quality improvement of the project are very important for client and subcontractors. Therefore, in real large projects, subcontractors join together and form coalitions for improving the project profit. A key question is how an extra profit of cooperation amo...

متن کامل

A game theory approach to the Iranian forest industry raw material market

Dynamic game theory is applied to analyze the timber market in northern Iran as a duopsony. The Nash equilibrium and the dynamic properties of the system based on marginal adjustments are determined. When timber is sold, the different mills use mixed strategies to give sealed bids. It is found that the decision probability combination of the different mills follow a special form of attractor an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005