Generating Phonemes from Written Thai using Lexical Analysis based on Regular Expressions

نویسندگان

  • Leo van Moergestel
  • John-Jules Ch. Meyer
چکیده

This document describes the approach and techniques used in software that has been developed to generate phonemes from written Thai. This software has been used to generate the phonetic transcription of Thai words in a Thai-Dutch dictionary. The most important part of this software is a lexical analyzer based on regular expressions for matching patterns in the Thai writing system. Because most software tools that use regular expressions are still based on the 7-bit ASCII set, a mapping of Thai characters to ASCII-characters has been used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Generating flex Lexical Scanners for Perl Parse: : Yapp

Perl is known for its versatile regular expressions. Nevertheless, using Perl regular expressions for creating fast lexical analyzer is not easy. As an alternative, the authors defend the automated generation of the lexical analyzer in a well known fast application (flex) based on a simple Perl definition in the syntactic analyzer. In this paper we extend the syntax used by Parse::Yapp, one of ...

متن کامل

Lex ! A Lexical Analyzer Generator

Lex helps write programs whose control flow is directed by instances of regular expressions in the input stream. It is well suited for editor-script type transformations and for segmenting input in preparation for a parsing routine. Lex source is a table of regular expressions and corresponding program fragments. The table is translated to a program which reads an input stream, copying it to an...

متن کامل

Generating and Interpreting Referring Expressions as Belief State Planning and Plan Recognition

Planning-based approaches to reference provide a uniform treatment of linguistic decisions, from content selection to lexical choice. In this paper, we show how the issues of lexical ambiguity, vagueness, unspecific descriptions, ellipsis, and the interaction of subsective modifiers can be expressed using a belief-state planner modified to support context-dependent actions. Because the number o...

متن کامل

A Corpus-Based Study of Phoneme Distribution in Thai

This paper presents steps in accessing Thai phoneme distribution from large-scale written Thai corpora. The data were from 12 text genres from InterBEST [1], considered the biggest Thai corpora. Each word was transliterated using the grapheme-to-phoneme software [2]. Then, frequency of words, frequency of 81 Thai phonemes in each genre, and the 95% CIs of average occurrences of each phoneme wer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012