C2D2E2: Using Call Centers to Motivate the Use of Dialog and Diarization in Entity Extraction

نویسندگان

  • Kenneth Church
  • Weizhong Zhu
  • Jason Pelecanos
چکیده

This paper introduces a deceptively simple entity extraction task intended to encourage more interdisciplinary collaboration between fields that don’t normally work together: diarization, dialog and entity extraction. Given a corpus of 1.4M call center calls, extract mentions of trouble ticket numbers. The task is challenging because first mentions need to be distinguished from confirmations to avoid undesirable repetitions. It is common for agents to say part of the ticket number, and customers confirm with a repetition. There are opportunities for dialog (given/new) and diarization (who said what) to help remove repetitions. New information is spoken slowly by one side of a conversation; confirmations are spoken more quickly by the other side of the conversation. 1 Extracting Ticket Numbers Much has been written on extracting entities from text (Etzioni et al., 2005), and even speech (Kubala et al., 1998), but less has been written in the context of dialog (Clark and Haviland, 1977) and diarization (Tranter and Reynolds, 2006; Anguera et al., 2012; Shum, 2011). This paper describes a ticket extraction task illustrated in Table 1. The challenge is to extract a 7 byte ticket number, “902MDYK,” from the dialog. Confirmations ought to improve communication, but steps need to be taken to avoid undesirable repetition in extracted entities. Dialog theory suggests it should be possible to distinguish first mentions (bold) from confirmations (italics) based on prosodic cues such as pitch, energy and duration. t0 t1 S1 S2 278.16 281.07 I do have the new hardware case number for you when you’re ready 282.60 282.85 okay 284.19 284.80 nine 285.03 285.86 zero 286.22 286.74 two 290.82 291.30 nine 292.87 293.95 zero two 297.87 298.24 okay 299.30 300.49 M. as in Mike 301.97 303.56 D. as in delta 304.89 306.31 Y. as in Yankee 307.50 308.81 K. as in kilo 310.14 310.57 okay 310.77 311.70 nine zero two 311.73 312.49 M. D. 312.53 313.18 Y. T. 313.75 314.21 correct 314.21 317.28 and thank you for calling IBM is there anything else I can assist you with Table 1: A ticket dialog: 7 bytes (902MDYK) at 1.4 bps. First mentions (bold) are slower than confirmations (italics). phone matches calls ticket matches (edit dist) 66% 238 0 59% 82 1 55% 40 2 4.1% 4033 3+ Table 2: Phone numbers are used to confirm ticket matches. Good ticket matches (top row) are confirmed more often than poor matches (bottom row). Poor matches are more common because ticket numbers are relatively rare, and most calls don’t

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives

In this paper we present a novel scheme for improving speaker diarization by making use of repeating speakers across multiple recordings within a large corpus. We call this technique speaker re-diarization and demonstrate that it is possible to reuse the initial speaker-linked diarization outputs to boost diarization accuracy within individual recordings. We first propose and evaluate two novel...

متن کامل

A TESLA-based mutual authentication protocol for GSM networks

The widespread use of wireless cellular networks has made security an ever increasing concern. GSM is the most popular wireless cellular standard, but security is an issue. The most critical weakness in the GSM protocol is the use of one-way entity authentication, i.e., only the mobile station is authenticated by the network. This creates many security problems including vulnerability against m...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Data-Driven Language Understanding for Spoken Language Dialogue∗

We present a natural-language customer service application for a telephone banking call center, developed as part of the AMITIES dialogue project (Automated Multilingual Interaction with Information and Services). Our dialogue system, based on empirical data gathered from real call-center conversations, features data-driven techniques that allow for spoken language understanding despite speech ...

متن کامل

Exposure to electromagnetic fields at two call centers in Turkey, 2015

Background: This study aims to evaluate the negative health impacts of exposure to electromagnetic field and to prepare a risk map of two selected call centers. Materials and Methods: Two call centers whose electromagnetic field values were measured by calibrated low high and point frequency measurement device. The measurements were performed by following the EN 50492 Standards. 178 employees ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016