Wanpela deitabeis long Tok Pisin bilong baim tiket bilong balus. (An ATIS database in Tok Pisin.) Methodological observations with regard to the collection of human–human data
نویسندگان
چکیده
This paper describes the collection of authentic human–human air travel information data in Tok Pisin, the pidgin/creole language spoken in Papua New Guinea. Pros and cons of authentic data are discussed, as compared to data collected in more controlled settings like Wizard-of-Oz simulations. Some unexpected real-life phenomena that affect the data, and normally do not occur in corpora compiled from Wizard-of-Oz simulations, are described.
منابع مشابه
"ko tok ples ensin bilong tok pisin" or the TP-CLE: a first report from a pilot speech-to-speech translation project from Swedish to tok pisin
This paper describes an operational speech-to-speech translation system from Swedish to Tok Pisin within the framework of the Spoken Language Translator project, SLT [1]. The domain of translation is ATIS [11]. The grammar formalism used in the SLT project is the Core Language Engine, CLE [2]. A general presentation of Tok Pisin is provided, as well as a description of some grammatical characte...
متن کاملCrosslinguistic disfluency modeling: a comparative analysis of Swedish and tok pisin human-human ATIS dialogues
This paper studies disfluencies in authentic human–human dialogues in Swedish and Tok Pisin. It is found that while there are no major differences as to types or frequencies on a macro level, there are dissimilarities on a micro level, notably in the characteristics of how prolonged segments are realized. The paper also discusses the results in the light of reported disfluencies in English, Ger...
متن کاملUsing a Pidgin Language in Formal Education: Help or Hindrance?
Pidgin and Creole languages are rarely used informal education because of three arguments (1) they are degenerate languages, (2) it is a waste of time to use a pidgin or creole when the standard language is the key to success in education and employment, and (3) theuseofapidginor creole will interfere with students' subsequent acquisition of the standard language Linguists can easily refute the...
متن کاملProlongations: A dark horse in the disfluency stable
This paper studies a specific type of disfluency, viz. segment prolongation (PR), i.e., the “stretching out” of speech sounds as a means of hesitation. It is shown that the occurrence of PRs varies as a function of phone type, position in the word, lexical factors and word class, and that PRs are subject to phonotactic constraints in Swedish. A comparison between Swedish and Tok Pisin suggests ...
متن کاملExploring minimal pronunciation modeling for low resource languages
Pronunciation lexicons can range from fully graphemic (modeling each word using the orthography directly) to fully phonemic (first mapping each word to a phoneme string). Between these two options lies a continuum of modeling options. We analyze techniques that can improve the accuracy of a graphemic system without requiring significant effort to design or implement. The analysis is performed i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017