JUPITER Data Collection and Analysis
نویسندگان
چکیده
We have been actively collecting data within the JUPITER domain since the beginning of 1997. As was done in previous domains, we first developed a prototype JUPITER system and used it to collect spontaneous speech using a Wizard paradigm, with a human typist in the loop and subjects brought into the lab and given scenarios to solve. At the same time, we solicited read speech using both our Web-based data collection facility and a phone number that subjects could call to read from pre-distributed lists. Once these data had been collected, we were able to train a recognizer and move on to system-based data collection. We currently have a toll-free number that is available 24-hours/day for subjects to call to find out weather information 1. The utterances collected from this facility are also used as training data. The toll-free number has been a particularly powerful method for collecting data from a variety of subjects in a short period of time. We feel that these calls accurately reflect the way users want to interact with such systems. In recent months, we have been receiving approximately 17 calls per day. In order to ensure that these data are ready to use as quickly as possible for both speech recognition and natural language development, we have been processing incoming data on a daily basis. Every morning a script automatically sends email containing a list of the previous days calls. These calls are then manually transcribed, usually over the course of the following day. The transcribed calls are then bundled into sets containing approximately 500 utterances and are added to the training corpus as they become available. Over the past year, we have continued to refine our transcription tool which was originally developed for orthographic transcription of read speech. We used a Tcl/ Tk interface to provide an editable window where the transcriber listen to utterances and correct existing transcriptions and add specialized markings for noise, partial words etc. The initial transcription for each utterance was the orthography hypothesized by the recognizer during the call. The transcriber could also identify the talker as male, female, or child using the transcription tool. The transcription tool uses a lexicon to check the quality of orthographic transcriptions. This feature is useful for finding typographical errors before they are saved. If a word does not appear in the lexicon, the transcriber is warned and given the option of …
منابع مشابه
On the Speed of Gravity and the Jupiter/quasar Measurement
I present the theory and analysis behind the experiment by Fomalont and Kopeikin involving Jupiter and quasar J0842+1845 that purported to measure the speed of gravity. The computation of the vJ/c correction to the gravitational time delay difference relevant to the experiment is derived, where vJ is the speed of Jupiter as measured from Earth. Since the vJ/c corrections are too small to have b...
متن کاملHow Common Are Earths? How Common Are Jupiters?
Among the billions of planetary systems that fill the Universe , we would like to know how ours fits in. Exoplanet data can already be used to address the question: How common are Jupiters? Here we discuss a simple analysis of recent exoplanet data indicating that Jupiter is a typical massive planet rather than an outlier. A more difficult question to address is: How common are Earths? However,...
متن کاملMokusei: a telephone-based Japanese conversational system in the weather domain
This paper describes MOKUSEI, an end-to-end Japanese version of our JUPITER weather information system. MOKUSEI delivers weather information over the phone through natural conversation with the user. For the most part, MOKUSEI uses the same components for recognition, understanding, and generation that JUPITER uses, and the database and the semantic frames for the weather information content ar...
متن کاملRadiation Synthesis of New Molecules on Jupiter ’ s Icy
JIMO presents an opportunity to the planetary science community to expand the detailed knowledge (data) of the composition and geomorpology of Jupiter’s Icy Moons – Callisto, Ganymede, Europa. This data will be used to develop understanding of the origin and evolution of these bodies as well as determination of life and sustainability potential. This presentation focuses on a class of data coll...
متن کاملM ar 1 99 9 Analysis of the sensor characteristics of the Galileo dust detector with collimated Jovian dust stream particles
The Dust Detector System onboard Galileo records dust impacts in the Jupiter system. Impact events are classified into four quality classes. Class 3 – our highest quality class – has always been noise-free and, therefore, contains only true dust impacts. Depending on the noise environment, class 2 are dust impacts or noise. Within 20 R J from Jupiter (Jupiter radius, R J = 71, 492 km) class 2 s...
متن کامل