Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks
نویسندگان
چکیده
With the advent of globalization, multilingualism in Indonesia gradually faces a state of catastrophe. Currently among 726 ethnic languages spoken in Indonesian archipelago, 146 are endangered. Several projects have been initiated for cultural preservation which can prevent the endangered language from being lost. Nevertheless, the available technology that could support communication within indigenous communities, as well as with people outside the community, is still very rare in Indonesia. Speech translation technology is one of the technologies that may help indigenous communities in Indonesia to overcome language barrier and cross cultural gap as well as to face globalization. Our long-term goal is to establish an infrastructure of speech translation system from ethnic languages to English/Indonesian, and this paper presents recent progress of data resources collection and speech recognition system development for four Indonesian major ethnic languages: Javanese, Sundanese, Balinese and Bataks.
منابع مشابه
Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project
The paper outlines the development of a large vocabulary continuous speech recognition (LVCSR) system for the Indonesian language within the Asian speech translation (A-STAR) project. An overview of the A-STAR project and Indonesian language characteristics will be briefly described. We then focus on a discussion of the development of Indonesian LVCSR, including data resources issues, acoustic ...
متن کاملLipid profiles among diverse ethnic groups in Indonesia.
AIM to describe the differences in plasma lipid profiles of 4 (four) Indonesian ethnic groups, i.e., Minangkabau, Sundanese, Javanese and Buginese. METHODS this cross sectional population study was consisted of adults aged 18 years and older. Lipid profile was assessed by collecting fasting blood samples among all the four ethnic groups. Sub samples of those 4 groups of ethnicity were randoml...
متن کاملModified Grapheme Encoding and Phonemic Rule to Improve PNNR-Based Indonesian G2P
A grapheme-to-phoneme conversion (G2P) is very important in both speech recognition and synthesis. The existing Indonesian G2P based on pseudo nearest neighbour rule (PNNR) has two drawbacks: the grapheme encoding does not adapt all Indonesian phonemic rules and the PNNR should select a best phoneme from all possible conversions even though they can be filtered by some phonemic rules. In this p...
متن کاملFatty acids intake among diverse ethnic groups in Indonesia
The use of dietary pattern specifically fatty acids intake should prove to be an informative and powerful means to augment our understanding of the role of diet in chronic disease particularly CHD. Cross sectional study was implemented to describe the nutrients intake specifically fatty acids intake of 4 (four) ethnic groups in Indonesia, such as Minangkabau, Sundanese, Javanese and Buginese. T...
متن کاملIndonesian speech recognition for hearing and speaking impaired people
This paper outlines our efforts in developing Indonesian speech recognition for hearing and speaking impaired people. The lack of speech-enabling technology and research, as well as a shortage of data on the Indonesian language presents a major challenge for us to deal with. Difficulties arise in developing an Indonesian speech corpus since Indonesian is actually most people’s second language a...
متن کامل