IrcamCorpusTools: an Extensible Platform for Spoken Corpora Exploitation
نویسندگان
چکیده
Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic analysis, storage and query. However, if a variety of tools exists for each of these individual tasks, they are rarely integrated into a single platform made available to a large community of researchers. In this paper, we propose IrcamCorpusTools, an open and easily extensible platform for analysis, query and visualization of speech corpora. It is already used for unit selection speech synthesis, for prosody and expressivity studies, and to exploit various corpora of spoken French or other languages.
منابع مشابه
IRCAM Corpus Tools: Managing speech corpora
Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis to statistical modeling of speech phenomena like prosody or expressivity. In all cases, these usages require a wide range of tools for corpus creation, labeling, symbolic and acoustic ...
متن کاملTowards Multimodal Spoken Language Corpora: TransTool And SyncTool
This paper argues for the usefulness of multimodal spoken language corpora and specifies components of a platform for the creation, maintenance and exploitation of such corpora. Two of the components, which have already been implemented as prototypes, are described in more detail: TransTool and SyncTool. TransTool is a transcription editor meant to facilitate and partially automate the task of ...
متن کاملThe Database for Spoken German ― DGD2
The Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD2, http://dgd.ids-mannheim.de) is the central platform for publishing and disseminating spoken language corpora from the Archive of Spoken German (Archiv für Gesprochenes Deutsch, AGD, http://agd.ids-mannheim.de) at the Institute for the German Language in Mannheim. The corpora contained in the DGD2 come from a variety of so...
متن کاملA Configurable Dialogue Platform for ASORO Robots
This paper is concerned with the architectural design and development of a spoken dialogue platform for robots. The platform adopts modular software architecture and event driven communication paradigm which makes speech enabled hardware devices and software components configurable and reusable. The platform is able to integrate heterogeneous dialogue components (such as speech recognizer, natu...
متن کاملA Framework for Multilevel linguistic Annotations
This article presents a 3-step model for multilayer annotations of corpora. Each kind of annotation for a textual corporacorresponds to a di erent view on the same document. This principle can be expressed rst with a general relational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the encoding of large corpora. The ...
متن کامل