Improved structure-based automatic estimation of pronunciation proficiency
نویسندگان
چکیده
Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs is often estimated from spectral envelopes of input utterances but the envelope patterns are also affected by alternating speakers. To develop a good and stable method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. In our previous study [1], to this end, we proposed a mathematicallyguaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have tested that representation also for ASR [2, 3, 4] and, through these works, we have learned better how to apply speech structures for various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, using the recently developed techniques for the structures, we carry out that experiment again but under different conditions. Here, we use a smaller unit of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show higher correlation between human and machine rating and also show extremely higher robustness to speaker differences compared to widely used GOP scores.
منابع مشابه
Pronunciation Proficiency Estimation Based on Multilayer Regression Analysis Using Speaker-independent Structural Features
Teachers can assess the pronunciations of students independently of extra-linguistic features such as age and gender observed in the students’ utterances. This capacity is, however, difficult to realize on machines because linguistic differences and extra-linguistic differences change acoustic features commonly. Therefore, the performance of automatic pronunciation assessment is inevitably affe...
متن کاملGOP performance improvement of automatic pronunciation assessment in a noisyenvironment
Compared to traditional language education methodologies, CALL systems have many potential benefits. CALL systems are faster and cheaper which allow learners to get feedback immediately and study by themselves without requiring the sole attention of a teacher. In CALL systems, a good pronunciation evaluation method is needed to inform learners about their proficiency and to correct their pronun...
متن کاملAnalysis of the non-native English pronunciation based on structural representation of speech
Recently, a novel acoustic representation of speech was proposed, where dimensions of the non-linguistic factors can hardly be seen. Using this structural representation, individual learners were described as distorted phonemic structures and automatic evaluation of the pronunciation was investigated. This paper describes two new analyses using the proposed method. The first analysis is done to...
متن کاملAssessment of Non-native Prosody for Spanish as L2 using quantitative scores and perceptual evaluation
In this work we present SAMPLE, a new pronunciation database of Spanish as L2, and first results on the automatic assessment of Nonnative prosody. Listen and repeat and read tasks are carried out by native and foreign speakers of Spanish. The corpus has been designed to support comparative studies and evaluation of automatic pronunciation error assessment both at phonetic and prosodic level. Fo...
متن کاملAutomatic Grading System for Mandarin Proficiency Test based on PSO-ANN
The National Mandarin Proficiency Test(NMPT) to be completed by computer-aided evaluation, not only can remove the human factor to establish uniform standards, but also save time, improve efficiency, accelerate the promotion and popularization of mandarin. In this paper, we focus on the optimization of artificial neural network by PSO for automatic grading the NMPT. A framework of the automatic...
متن کامل