SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

نویسندگان

چکیده

Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success the natural language field demonstrated its adaptation to speech. However, previous works on have not incorporated properties of speech, leaving full potential unexplored. In this paper, we consider characteristics propose a general structure-based framework, called SpeechFormer++, for paralinguistic processing. More concretely, following component relationship signal, design unit encoder model intra- inter-unit information (i.e., frames, phones, words) efficiently. According hierarchical relationship, utilize merging blocks generate features at different granularities, which consistent with structural pattern signal. Moreover, word introduced integrate word-grained into each encoder, effectively balances fine-grained coarse-grained information. SpeechFormer++ evaluated emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) Alzheimer's disease detection (Pitt) tasks. The results show that outperforms standard while greatly reducing computational cost. Furthermore, it delivers superior compared state-of-the-art approaches.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paralinguistic elements in speech synthesis

Corpus based text-to-speech systems currently produce very natural synthetic sentences, though limited to a neutral inexpressive speaking style. Paralinguistic elements are some of the expressive features one would most like to introduce. In this paper, we describe a new method for introducing laughter and hesitation in synthetic speech. Thanks to a small dedicated acoustic database, this metho...

متن کامل

Efficient processing of hierarchical graphs

Efficient processing of hierarchical graphs " (1990). Retrospective Theses and Dissertations. Paper 9385. The most advanced technology has been used to photograph and reproduce this manuscript from the microfihn master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer ...

متن کامل

A Hierarchical Framework for Efficient Multilevel Visual Exploration and Analysis

The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. Most traditional visualization tools and methods operate on an offline way, limited on accessing static (preprocessed) sets of data. They also restrict themselves on dealing with small dataset sizes, which can be easily visually analysed with conventional vi...

متن کامل

An Efficient Curvelet Framework for Denoising Images

Wiener filter suppresses noise efficiently. However, it makes the out image blurred. Curvelet preserves the edges of natural images perfectly, but, it produces visual distortion artifacts and fuzzy edges to the restored image, especially in homogeneous regions of images. In this paper, a new image denoising framework based on Curvelet transform and wiener filter is proposed, which can stop nois...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2023.3235194