Accessing speech data using strategic fixation
نویسندگان
چکیده
When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of potential interest, e.g. telephone numbers, within the transcript. Laboratory and field trials showed that SCANMail overcame most of the problems users reported with JotMail, supporting scanning, information extraction and archiving. Importantly, our evaluations showed that, despite errors, ASR transcripts provide a highly effective tool for browsing. Users exploited the enriched transcript to determine the gist of the underlying speech, and as a guide to identifying areas of speech that it was critical for them to play. Long-term field trials also showed the utility of transcripts to support notification and mobile access.
منابع مشابه
TSAB - Web Interface for Transcribed Speech Collections
This paper describes a new web interface for accessing large transcribed spoken data collections. The system uses automatic or manual time-aligned transcriptions with speaker and topic segmentation information to present structured speech data more efficiently and make accessing relevant speech data quicker. The system is independent of the underlying speech processing technology. The software ...
متن کاملUsing Non-speech Sounds to Improve Access to 2D Tabular Numerical Information for Visually Impaired Users
We investigated two solutions for numerical (2D) tabular data discovery and overview for visually impaired and blind users. One involved accessing information in tables (26 rows x 10 columns containing integers between and including 0 and 100) by this target user group using both speech and non-speech sounds. The other involved accessing similar information in tables of the same size through sp...
متن کاملIdentifying Strategic Information from Scientific Articles through Sentence Classification
We address here the need to assist users in rapidly accessing the most important or strategic information in the text corpus by identifying sentences carrying specific information. More precisely, we want to identify contribution of authors of scientific papers through a categorization of sentences using rhetorical and lexical cues. We built local grammars to annotate sentences in the corpus ac...
متن کاملSearching through spatial relationships using the 2DR-tree
The 2DR-tree is a novel approach for accessing spatial data, particularly when spatial relationships are defined among objects. The 2DR-tree uses nodes that are the same dimensionality as the data space. Therefore, all relationships between objects are preserved and different binary search strategies are supported. This paper presents the 2DR-tree binary search strategy. A preliminary performan...
متن کاملMulti-Stage DNN Training for Automatic Recognition of Dysarthric Speech
Incorporating automatic speech recognition (ASR) in individualized speech training applications is becoming more viable thanks to the improved generalization capabilities of neural network-based acoustic models. The main problem in developing applications for dysarthric speech is the relative in-domain data scarcity. Collecting representative amounts of dysarthric speech data is difficult due t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 21 شماره
صفحات -
تاریخ انتشار 2007