Data Types In Computational Phonology
نویسنده
چکیده
This paper exanfines cer ta in aspects of phonological s t ruc ture from the v iewpoint of ahs t r ac t d a t a types, Our imnled ia te goal is to find a forma t for l)honological representa t ion which will be reasonably f,'fithful to the concerns of theoret i : cal phonology while I)eing rigorous enough to a(Irail a computa t iona l in ter l ) re ta t ion. The longer t e rm goal is to incorpora te such representa t ions i n t o all appropr ia te general framework for llatnral l anguage processing, i 1 I n t r o d u c t i o n One of the dominant paradignls ill cnrrel l | colnputat . ional l inguist ics is l)rovided by unificationbased g r a m m a r formalisms. Such formalisms (of. IShieber 1986; Kasper t~ Rounds 1986)) describe hierarchic~d feature s t l ' t l e t l l r e s , w h i c h iH i n a l l y ways would appear to be an ideal se l l ing [br formal phonological analyses. 1,'eature bundles have long been used l)y phonologists , and more recent work on so-called feature geonletry (e.~. (Clements 1985; Sagey 19,~6)) has in t roduced hierarchy into such represenla t ions . Nevertheless. there are reasons to step back from s tandard feature-based apl~roaches, and instead to adopl the algebraic perspec t ive of abs t rac l data types (AD'P) which has been widely adopted iu coml)uter science. One general mot ivat ion, which we shall not e.xplore here. is thai Ihe ac l iv i ly of gran t lnar wri t ing, viewed as a process of prog r a m m e specification, should be amenable Io sl~pwise refinement in which the set of {sol necessari ly isomorphic) n,odels admi t t ed by a loose IThe work reported in this paper has [)¢:~,1, ~;tl ried ollt its part of the research i)rf)glitli/lll(!S o{ l]l(' ].{llnl&l;[ (~oFiin/llllic&[iOll |lesea.rch (}(:Illl'C. sl/ppOl'led })3 the OK Economic and Social Rescalch (:ouncil aml the project Computational l)houoh)gy: .,I ('onst~aint-fh~s¢d Approach, funded by the IlV. ~qcience and Engineering I(t. search Council, under grant (;R/(;-22081. 1 am glalt'ful to Steven Bird. Kimba Newton and 'l'm/v Simou [m di> cussions relating to this work. AcrEs DE COLING-92, NAtc~S, 23-28 Ao~rr 1992 speci l icat ion is gradual ly narrowed down to a u,fiqtm 'algebra (cf. (Sannel la & Tarleeki 1987) for an overview, and (Newton in prep.) for the apldicat ion to g r ammar writ ing). A second mot ivat ion, discussed in detai l by (Beierle & P le t a t 1988; Beierle K~ P le t a t 1989; Beierle et al. 1988), is to use equat iona l ADTS to provide a ma thema t ical foundation for h~ature s t ructures . A third mot ivat ion, dominan t in this pal)er , is to use the AI)T appl'oach lo provide a richer array of explicit da t a types than are readily admi t t ed by "p'tlre' feature s t ruc ture approaches . Briefly, in their raw form, [eature te rms (i.e., fnrnlalislns for descr ibing h~alure stru(:tures) do not a lways provide a perspicuous format for represent ing strllct II re. On the ADT approach, complex d a t a types are built up from atomic types by means of c o n s t r u c t o r f u n c t i o n s . For example . . . . (where we use the underscore '_' to mark the posit ion of the f imction 's a rguments ) creates e lements of type L i s t . A dala type may also have s e l e c t o r f u n c t i o n s for tak ing data e lements apar t . Thus, selectors for lhe type L±s t are the func tions f i r s t and l a s t . S tandard feature-bossed encoding of l isls uses only selectors for the da ta type; i.e. the feature labels FIRST and LAST ill ( 1 ) FIRST : o" 1 17 LAST : (FIRST : o" 2 17 LAST : nil) t lowever, the list const ructor is left implici t , That is, the feature term encoding tells you how lists are pulled apar t , but does not say how they are built up. When we confine our a t len t ion jus t to lists, lhis is not much to worry about , ltowever, tile s i tuat ion becomes less sat isfactory when we atIelnpI' to encode a larger variety of da t a structures into one and the same feature term; say, for example , s tandard lis(s, associatiw~ lists (i.e. str ings), cons t i tuent s t ruc ture hierarchy, and au tosegmenta l association. In order to dis t inguish axtequately between e lements of such da ta types, we really need to know the logical propert ies of their respective constructors , and this is a w l 1 4 9 PRec. oF COLING-92. NANTES. AUG. 23-28. 1992 ward when the const ructors are not made explicit . For compu ta t iona l phonoloKv, i t is not an unl ikely scenario to be confronted with such a var iety of d a t a s t ructures , since one may well wish to s tudy the complex in terac t ion between, say, non-l inear teml)oral relat ions and prosodic hierarchy. As a vehicle for computa t iona l implementa t ion , the uniformity of s tandard a t t r i b u t e / v a l u e no ta t ion is ex t remely usefld. As a vehicle for theory development , it can be ex t raord inar i ly uuperspicuous. The approach which we present here t rea ts phonological concepts as abs t r ac t d a t a types. A part icular ly convenient development environlnent is provided by the language OBJ (Goguen & Winkler 1988), which is based on order sorted equa, tionaJ logic, and all the examples given below (except where exp lMt ly iudicated to the con t rary) run in the version of OBJ3 released by sltI in 1988. The denota l iona l semant ics of a.n OB.] module is an algehra, while its opera t ional semant ics is based on order sorted rewrit iug. I 1 1.1 and 1.2 give a more deta i led in t roduct ion into the formal framework, while § 2 and 3 i lh ls t ra te the approach with some phonological examples. 1 .1 A b s t r a c t D a t a T y p e s A d a t a type consists of one or more domains of d a t a i tems, of which certaiu e lements are designated as basic, together wi th a set of opera t ious on the domains which suffice to generate al] d a t a i tems in the domains fl'om the I)asic i tems. A d a t a type is a b s t r a c t if it is independenl of any par t icu la r re t ) resentat ional scheme. A fundamen ta l claim of the ADJ group (cf. (Goguen. Tha tche r ,~ Wagner 1976)) and llluch subsequent work (cf. (Ehrig & MMn" 1985)) is t ha t abs t racl d a t a types are ( to be modelled as) algebras: and moreover, t ha t the models of abs t r ac t data types are ilfitial alget)ras. ~ The s i g n a t u r e o fa mauy-sor ted algebra is a l)air = consist ing of a signal are together wi th a set g of equat ions over terms cons t ruc ted from symbols in O and variables of the sorts in S. A m o d e l for a speciIica.tion is ~An initial algebra is characlerized uniquely up to |so morphism as the semantics of a specification: there is a unique homomorphisnl from the initial algebra inlo t'vely algebra of the specification. an algebra over the s igna ture which satisfies all the equat ions £. Ini t ia l algebras play a special role as the semant ics of an algebra. An ini t ia l a lgebra is minimal , in the sense expressed by the principles "no junk ' and 'no confusion' . 'No junk ' means tha t the a lgebra only contains d a t a which are denoted by variable-fl 'ee terms bui l t up from ol)eration symbols in the s ignature . 'No confusion' means tha t two such te rms t and t ~ denote the same object in the a lgebra only if the equation t = F is derivable from the equat ions of the specification. Specifications are wri t ten in a convent |ohM forma t consist ing of a declara t ion of s o r t s , operation symbols (op), and equat ions (oq). Preceding the equat ions we list all the variables ( v a r ) which figure in them. As an i l lus t ra t ion, we give below an OBJ sl)ecification of the d a t a type LIST1. (2) obj LIST1 i s s o r t s Ell L i s t op nil : -> List . op .~ : Eli List -> List . op head : List -> Eli . op tail : List -> List .
منابع مشابه
Computational Phonology
Phonology, as it is practiced, is deeply computational. Phonological analysis is data-intensive and the resulting models are nothing other than specialized data structures and algorithms. In the past, phonological computation – managing data and developing analyses – was done manually with pencil and paper. Increasingly , with the proliferation of affordable computers, IPA fonts and drawing sof...
متن کاملTowards a Computational Articulatory Model of Spanish Phonology
Many aspects of Spanish phonology remain poorly described within feature-based frameworks, in part because it is not well understood to what extent phonological features are grounded in the phonetic domain. In this paper, we introduce a computational model of Spanish phonology based on articulatory primitives (Browman & Goldstein 1992) that is currently being developed as an extension of the Ta...
متن کاملPerplexity of bi-phone phonotactic models in Korean loanword phonology
The paper presents a corpus study which shows that the probability distribution of bi-phones in a lexicon of Korean loanwords is significantly different from that in a typical Korean lexicon or a lexicon consisting solely of native Korean and Sino-Korean words. This is demonstrated by comparing the perplexity of two types of bi-phone phonotactic models: a model trained on a set of Korean loanwo...
متن کاملComputational Phonology - Part I: Foundations
Computational phonology approaches the study of sound patterns in the world’s languages from a computational perspective. This article explains this perspective and its relevance to phonology. A restrictive, universal property of phonological patterns— they are regular—is established, and the hypothesis that they are subregular is presented. This article is intended primarily for phonologists w...
متن کاملModelling, Formality and the Phonetics–Phonology Interface
Arguing for an increased cooperation between the fields of formal modelling and phonology, we illustrate the potential of several models from the computational sciences in phonology. We propose the skeleton of a multi-layer formal model from phonology through production and perception back to phonology.
متن کاملComputational phonology today*
This thematic issue almost did not happen. One of us (JH) was almost killed two days after the deadline for article submissions. As a pedestrian on a sidewalk minding his own business, he was struck by a car that ran a red light and lost control after a collision. So when we write that we are delighted to be writing this introduction, over one year later, we both really mean it. Broadly speakin...
متن کامل