Wanpela deitabeis long Tok Pisin bilong baim tiket bilong balus. (An ATIS database in Tok Pisin.) Methodological observations with regard to the collection of human–human data

نویسندگان

  • Robert Eklund
  • Antonis Botinis
چکیده

This paper describes the collection of authentic human–human air travel information data in Tok Pisin, the pidgin/creole language spoken in Papua New Guinea. Pros and cons of authentic data are discussed, as compared to data collected in more controlled settings like Wizard-of-Oz simulations. Some unexpected real-life phenomena that affect the data, and normally do not occur in corpora compiled from Wizard-of-Oz simulations, are described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"ko tok ples ensin bilong tok pisin" or the TP-CLE: a first report from a pilot speech-to-speech translation project from Swedish to tok pisin

This paper describes an operational speech-to-speech translation system from Swedish to Tok Pisin within the framework of the Spoken Language Translator project, SLT [1]. The domain of translation is ATIS [11]. The grammar formalism used in the SLT project is the Core Language Engine, CLE [2]. A general presentation of Tok Pisin is provided, as well as a description of some grammatical characte...

متن کامل

Crosslinguistic disfluency modeling: a comparative analysis of Swedish and tok pisin human-human ATIS dialogues

This paper studies disfluencies in authentic human–human dialogues in Swedish and Tok Pisin. It is found that while there are no major differences as to types or frequencies on a macro level, there are dissimilarities on a micro level, notably in the characteristics of how prolonged segments are realized. The paper also discusses the results in the light of reported disfluencies in English, Ger...

متن کامل

Using a Pidgin Language in Formal Education: Help or Hindrance?

Pidgin and Creole languages are rarely used informal education because of three arguments (1) they are degenerate languages, (2) it is a waste of time to use a pidgin or creole when the standard language is the key to success in education and employment, and (3) theuseofapidginor creole will interfere with students' subsequent acquisition of the standard language Linguists can easily refute the...

متن کامل

Prolongations: A dark horse in the disfluency stable

This paper studies a specific type of disfluency, viz. segment prolongation (PR), i.e., the “stretching out” of speech sounds as a means of hesitation. It is shown that the occurrence of PRs varies as a function of phone type, position in the word, lexical factors and word class, and that PRs are subject to phonotactic constraints in Swedish. A comparison between Swedish and Tok Pisin suggests ...

متن کامل

Exploring minimal pronunciation modeling for low resource languages

Pronunciation lexicons can range from fully graphemic (modeling each word using the orthography directly) to fully phonemic (first mapping each word to a phoneme string). Between these two options lies a continuum of modeling options. We analyze techniques that can improve the accuracy of a graphemic system without requiring significant effort to design or implement. The analysis is performed i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017