New feature weighting approaches for speech-act classification

نویسنده

  • Youngjoong Ko
چکیده

A dialogue system is a software program that enables a user to interact with a computer using a natural language (Kang et al. 2014). Since an essential task of the dialogue system is to understand what the user says, it must be able to determine the user’s intention indicated in the user’s utterance. A speech-act is a linguistic action and implies the user’s intention. Therefore, the dialogue system must identify the speech-act of user’s utterance. Although researchers have developed many techniques for the speech-act classification, they have mainly used the binary feature weighting scheme because it is simpler but more effective than other schemes such as tf (traditional term frequency), idf (inverse document frequency) and tf.idf (Manning and Schütze, 1999; Salton and Buckley, 1998; Sebastiani, 2002). A utterance is usually much shorter than a document, and it means that the utterance has only the small number of features. For example, as two major factors of traditional tf.idf, tf is the number of term occurrence in a document and df (document frequency) is the number of documents that a term occurs in a collection. In particular, since tf rarely becomes more than 2 in an utterance due to the short length of the utterance, terms with more than 2 frequencies make the distribution of term weights biased and it causes the poor performance of speech-act classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

An MCE based classification tree using hierarchical feature-weighting in speech recognition

In this paper a hierarchical classification framework using the feature-weighting tree for the objective of applying diverse weighting to acoustic features is proposed for speech recognition. The hierarchical feature-weighting tree with a flexible structure complexity can be constructed optimally with the optimal splitting for the recognition confusion graph. Based on the minimum classification...

متن کامل

Towards a Contrastive Pragmatic Analysis of Congratulation Speech Act in Persian and English

This paper aims at studying the speech act of congratulation in Persian and English with regard to semantic formulas. To gather the semantic formulas related to congratulation, the researchers chose 100 movies (50 in Persian and 50 in English) as the instrument of the study. The only model of cross-cultural comparison was related to that of Elwood (2004). Therefore, we used Elwood’s model as th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 51  شماره 

صفحات  -

تاریخ انتشار 2015