Characterizing Ranked Chinese Syllable-to-Character Mapping Spectrum: A Bridge Between the Spoken and Written Chinese Language
نویسنده
چکیده
One important aspect of the relationship between spoken and written Chinese is the ranked syllable-tocharacter mapping spectrum, which is the ranked list of syllables by the number of characters that map to the syllable. Previously, this spectrum is analyzed for more than 400 syllables without distinguishing the four intonations. In the current study, the spectrum with 1280 toned syllables is analyzed by logarithmic function, Beta rank function, and piecewise logarithmic function. Out of the three fitting functions, the two-piece logarithmic function fits the data the best, both by the smallest sum of squared errors (SSE) and by the lowest Akaike information criterion (AIC) value. The Beta rank function is the close second. By sampling from a Poisson distribution whose parameter value is chosen from the observed data, we empirically estimate the pvalue for testing the two-piece-logarithmic-function being better than the Beta rank function hypothesis, to be 0.16. For practical purposes, the piecewise logarithmic function and the Beta rank function can be considered a tie.
منابع مشابه
Input Chinese sentences using digits
Chinese character input is always a key issue in a variety of Chinese based applications especially when only a small number keypad is available. Though many kinds of Chinese character encoding schemes are proposed according to Chinese character characteristics, such as the shape, they are not straightforward and will take users a long time to learn. An easy way is to input via Chinese pinyins....
متن کاملSyllable frequency and word frequency effects in spoken and written word production in a non-alphabetic script
The effects of word frequency (WF) and syllable frequency (SF) are well-established phenomena in domain such as spoken production in alphabetic languages. Chinese, as a non-alphabetic language, presents unique lexical and phonological properties in speech production. For example, the proximate unit of phonological encoding is syllable in Chinese but segments in Dutch, French or English. The pre...
متن کاملImproved search strategy for large vocabulary continuous Mandarin speech recognition
This paper presents a new search strategy for large vocabulary continuous Mandarin speech recognition considering the special structure of Chinese language. This strategy is composed of a forward and a backward passes, between which a high-quality syllable lattice is generated to bridge the syllable-level and word-level decoding processes. In the forward pass, considering the small number of sy...
متن کاملThe Spoken/Written Language Classification of English Sentences with Bilingual Information
To alleviate the problem with Chinese being poor at telling the difference between spoken and written English which is important for learning and using the language, we propose to classify English sentences with bilingual information into the two categories automatically. Based on the text categorization technology, we explore a variety of features, including words, statistics and their combina...
متن کاملRecognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger
Named Entity Recognition (NER) is an important task in information extraction, where major attention has been paid to written texts of a news or academic paper (esp. biomedical) style. In this paper we report the first piece of work on NER in spoken Chinese dialogues, as a preliminary step for spoken language understanding. The NER task is taken as a sequential classification problem and solved...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 20 شماره
صفحات -
تاریخ انتشار 2013