Sorting by Sound : Arbitrary Lexical Ordering for Transcribed Thai Text
نویسنده
چکیده
When either Thai or transcribed (Romanized) Thai is sorted alphabetically, words that sound very much alike usually end up far apart. maay and may are thrown to opposite ends of the letter m entries, even though mistaking one for the other causes problems for both foreign students who cannot speak clearly, and Thais who can't spell. This paper explains how and why the difficulty occurs, and shows why both Thai and transcription are inherently difficult to sort by sound. It introduces a method of preprocessing — deriving phonemic signatures — that lets us define improved lexical or dictionary orders, yet does not require anything but standard sorting code. The method can be applied to other languages — Lao, Khmer, and Burmese — that, like Thai, distinguish words on the basis of vowel length and/or tone.
منابع مشابه
On the flexibility of letter position coding during lexical processing: evidence from eye movements when reading Thai.
Previous research supports the view that initial letter position has a privileged role in comparison to internal letters for visual-word recognition in Roman script. The current study examines whether this is the case for Thai. Thai is an alphabetic script in which ordering of the letters does not necessarily correspond to the ordering of a word's phonemes. Furthermore, Thai does not normally h...
متن کاملText Processing Of Thai Language "The Three Seals Law"
Computer softwares for processing Thai language are developed at National Museum of Ethnology,Osaka,Japan. We use a popular intelligent terminal TEKTRONIX 4051 for inputting and editing,IBM 370 model 138 for KWIC making and sorting, and CANON's laser beam printer for final output. Using these systems,"Kotmai Tra Sam Duang"(the Three Seals Law)which contains many kind of laws and ordinances proc...
متن کاملIranian EFL Learners’ Lexical Inferencing Strategies at Both Text and Sentence levels
Lexical inferencing is one of the most important strategies in vocabulary learning and it plays an important role in dealing with unknown words in a text. In this regard, the aim of this study was to determine the lexical inferencing strategies used by Iranian EFL learners when they encounter unknown words at both text and sentence levels. To this end, forty lower intermediate students were div...
متن کاملThai Word Segmentation with Hidden Markov Model and Decision Tree
The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a nondictionary based technique with a dictionary-based technique. This method first applies the Thai language...
متن کاملL2 Learners’ Lexical Inferencing: Perceptual Learning Style Preferences, Strategy Use, Density of Text, and Parts of Speech as Possible Predictors
This study was intended first to categorize the L2 learners in terms of their learning style preferences and second to investigate if their learning preferences are related to lexical inferencing. Moreover, strategies used for lexical inferencing and text related issues of text density and parts of speech were studied to determine their moderating effects and the best predictors of lexical infe...
متن کامل