Speech production of vowel sequences using a physiological articulatory model
نویسندگان
چکیده
This report describes the development of a physiologically-based articulatory model, which consists of the tongue, mandible, hyoid bone and vocal tract wall. These organs are represented in a quasi-3D shape to replicate a midsagittal layer with a thickness of 2 cm for tongue tissue and 3 cm for tract wall. The geometry of these organs and muscles are extracted from volumetric MR images of a male speaker. Both the soft and rigid structures are represented by mass-points and viscoelastic springs for connective tissue, where the springs for bony organs are set to extremely large stiffness. This design is suitable to compute soft tissue deformations and rigid organ displacements simultaneously using a single algorithm, and thus reduces computational complexities of the simulation. A novel control method is developed to produce dynamic actions of the vocal tract, as well as to handle the collision of the tongue to surrounding walls. Area functions are obtained for vowel sequences based on model’s vocal tract widths in the midsagittal and parasagittal planes. The proposed model demonstrated plausible dynamic behaviors for human speech articulation. 1. MODEL CONSTRUCTION To replicate the behaviors of human speech organs, speakerspecific customization of the model was carried out by replicating the anatomical information that was obtained from volumetric MRI data of a male Japanese speaker. 1.1 Design of the Tongue Shape The tongue tissue model is designed as a thick sagittal layer bounded by three sagittal planes. This design was chosen to form the midsagittal groove of the tongue and the side airway in producing vowels and consonants. The tongue tissue has been modeled commonly using the finite element method [1,2]. Our earlier study aimed at developing an integrated model that combined an FEM model of the tongue and a beam-muscle model of the jaw-larynx system [3]. The computations of movements in this hybrid model were slow because the achievement of an equilibrium between the soft tissue and rigid organs took considerable time. One possible solution to this problem is to model all speech organs using an identical method. To the end, a mass-spring network is used to model both the soft tissue and rigid organs in the current model. The basic structure of the tongue tissue model roughly replicates the fiber orientation of the genioglossus muscle. The central part of the tongue that includes this muscle is represented by a 2-cmthick layer with three sagittal planes. Each plane is divided into six sections with nearly equal intervals in the anterior-posterior direction and ten sections along the tongue surface. The tongue tissue model is shown in Fig. 1 with the vocal tract wall. In the tongue model, the mesh lines represent viscoelastic springs, and mass-points are located in the intersections of the mesh lines. The mass-points in the midsagittal plane also connect to the corresponding mass-points in the right and left planes by the springs. To relate a deformation and a stress in the mass-spring network, the mass-points also connect with diagonally adjacent ones by the springs. Thus, the original shape can be restored from a deformation due to the strain forces when external forces are removed. The mass per unit volume is chosen to be 1 g/cm for the tongue tissue, which is the same as that of water. -2 0 2 4 6 -2 0 2 -2 0 2 4 6 Teeth Hard palate Velum P h a r y n g e a l w a l l P i r i f o r m f o s s a Larynx Jaw Fig. 1 The oblique view of three-dimensional model of the speech organs. All dimensions are in cm. The Voigt model was adopted to approximate the properties of the tongue tissue, which consists of a spring parallel to a dashpot. The mechanical parameters for the spring and the dashpot reported in the previous studies deferred widely: the stiffness ranged from 10-10 dyne/cm, and the viscosity from 10-10 dyne•s/cm [4]. In the present model, parameters were chosen to be 1.54x10 dyne/cm for the stiffness and 1.75x10 dyne•s/cm for the viscosity. 1.2 Modeling of the Rigid Organs Outlines of the rigid organs (i.e., the jaw and hyoid bone in the present work) were also traced from the MRI data for the target subject. The contours of the bony organs were identifiable in MR images when they are surrounded by soft tissue. According to the extracted geometries, the mandible is modeled by four masspoints on each side, which form two triangles using five rigid beams including one shearing-beam [5]. The mandible model is combined with the tongue model at the mandibular symphysis. 5th International Conference on Spoken Language Processing (ICSLP 98) Sydney, Australia November 30 -December 4, 1998 ISCA Archive http://www.isca-speech.org/archive 2 The temporomandibular joint is designed to produce two types of motions: rotation and translation. The model of the hyoid bone has three segments corresponding to the body and bilateral greater horns, which also offers rotation and translation motions. Each segment of the hyoid bone is modeled by two mass-points connected by a rigid beam. Eight muscles are incorporated in the model of the mandible-hyoid bone complex. 1.3 Construction of the Vocal Tract Wall To determine a vocal tract shape, it is necessary to incorporate the organs surrounded the tongue in the model. The surrounding organs are the lips, teeth, hard palate, soft palate (the velum), pharyngeal wall, and the laryngeal tube. At this stage, the present model has no lips, and treats the other organs as a single rigid wall. Therefore, the movements of the velum and larynx are not taken into account in the present model. The outlines of the vocal tract wall are extracted from MRI data in the midsagittal plane, and the parasagittal planes of 0.7 and 1.4 cm apart from the midsagittal plane on the right side. With an assumption that the left and right sides are symmetric, 3D surface models of the vocal tract wall and the mandibular symphysis were reconstructed using the outlines with 0.7 cm intervals in the left-right direction, as shown in Fig. 1. Because of the geometrical complexities, it was not able to derive an analytic function for the surface walls. For this reason, the surfaces of the tract wall and the mandibular symphysis were approximated using small triangular planes, 432 planes for the tract wall, and 192 for the mandible. 1.4 Arrangement of the Tongue Muscles The anatomical arrangement of the major tongue muscles was determined based on high-resolution MR images obtained from the same target speaker. The genioglossus (GG), geniohyoid (GH), and mylohyoid (MH) were extracted in the midsagittal plane. The superior longitudinal (SL), and inferior longitudinal (IL) were identified in the plane 0.6 cm apart from the midsagittal. The hyoglossus (HG) and styloglossus (SG) were distinguished in the plane 1.5 cm apart from the midsagittal. The orientation of all these tongue muscles was also examined with reference to the literature [6]. Figure 2 shows the location for the extrinsic muscles in the model, (a) for the midsagittal plane, and (b) for the parasagittal plane. The genioglossus (GG), the largest muscle in the tongue, runs midsagittally in the central part of the tongue. Since the triangular muscle GG exerts different effects on tongue deformation in different parts, it can be functionally separated into three muscle bundles: the anterior portion (GGa), middle portion (GGm), and posterior portion (GGp). The thickness of the lines represents the size of the muscle units, the thicker the line, the larger the maximum force produced. The hyoglossus (HG) and styloglossus (SG), shown in the parasagittal plane, are designed to be symmetrical on the left and right sides. Totally, eleven tongue muscles were treated included in the model. Genioglossus
منابع مشابه
Feedforward control of a 3d physiological articulatory model for vowel production
A 3D Physiological articulatory model has been developed to account for the biomechanical properties of speech organs in speech production. To control the model for investigating the mechanism of speech production, a feedforward control strategy is necessary to generate proper muscle activations according to desired articulatory targets. In this paper, we elaborated a feedforward control module...
متن کاملImprovement of a physiological articulatory model for synthesis of vowel sequences
A 3D physiological articulatory model has been constructed based on volumetric MRI data obtained from a male speaker. The model is driven by muscles according to a target-dependent activation pattern. In this study, we improved dynamic characteristics of the model to produce higher sound quality for vowel sequences. Dynamic characteristics of articulatory organs were investigated using X-ray mi...
متن کاملSpeech Sythesis of Vcv Sequences Using a Physiological Articulatory Model
A 3-D articulatory model has been male speaker. The model consists of the constructed based on volumetric MRI data for a Japanese midsagittal layer of the tongue, jaw-hyoid bone complex, and vocal tract wall that comprise the main vocal tract. This work describes a multi-point control strategy for producing vowel-consonant-vowel sequences through the generation of muscle contraction parameters ...
متن کاملEstimation of vocal tract area function for Mandarin vowel sequences using MRI
To fully explore the dynamic properties of speech production and investigate the relation between vocal tract geometry and speech acoustics, estimation of vocal tract area functions from measurements of the sagittal plane is an important step. In this study, we investigated the relation between the measurements on two dimensional (2D) and three dimensional (3D) MRI data and used an alpha-beta m...
متن کاملGender-specific Differences in the Articulatory and Acoustic Realization of Interword Vowel Sequences in American English
Differences in male and female vocal tract dimensions are hypothesized to have a number of dynamic consequences – undershoot, greater acoustic vowel space size, articulatory speed. Evidence for some of these predictions is sought by investigating articulatory and acoustic patterns in interword vowel sequences in the University of Wisconsin X-ray Microbeam Speech Production Database (UWDB). Mean...
متن کاملA model based investigation of activation patterns of the tongue muscles for vowel production
Muscle activations in speech production are important for understanding speech control. To overcome the problems of previous methods, we proposed a physiological articulatory model based approach to explore the muscle activations in the production of the five sustained Japanese vowels through an optimization procedure which minimizes the morphological differences between the model simulations a...
متن کامل