Gesture recognition corpora and tools: A scripted ground truthing method

نویسندگان

Simon Ruffieux

Denis Lalanne

Elena Mugellini

Omar Abou Khaled

چکیده

This article presents a framework supporting rapid prototyping of multimodal applications, the creation and management of datasets and the quantitative evaluation of classification algorithms for the specific context of gesture recognition. A review of the available corpora for gesture recognition highlights their main features and characteristics. The central part of the article describes a novel method that facilitates the cumbersome task of corpora creation. The developed method supports automatic ground truthing of the data during the acquisition of subjects by enabling automatic labeling and temporal segmentation of gestures through scripted scenarios. The temporal errors generated by the proposed method are quantified and their impact on the performances of recognition algorithm are evaluated and discussed. The proposed solution offers an efficient approach to reduce the time required to ground truth corpora for natural gestures in the context of close human–computer interaction. These last years, the field of human gesture and activity recognition has been evolving rapidly due to the research and development in novel sensors for human action, activity and gesture recognition. These new sensors can be split in three types: vision (color, depth or heat), position (inertial motion units, global positioning system, or motion capture) and physiological (temperature, heart rate or electromyography). The advances in technology allowed engineers to produce smaller, more efficient and cheaper sensors and the possibility to embed them in wearable devices such as necklaces, watches, and controllers. These new sensors offer interesting exploration paths for research but also complexify the quantitative comparisons of methods, algorithms and sensors. We identified three linked issues hindering research in the domain of natural gesture recognition. The recognition of gesture performed in the air by a human is often only considered as a subdomain of action and activity recognition and may confuse researchers, the lack of standards and common structure amongst corpora restraint valid quantitative comparisons of methods and the increasing complexity and cost of creating multi-purposes corpora may become a problem for researchers. The first issue concerns the confusion between research domains. Three main paths of exploration can be distinguished: human action and activity recognition, human surveillance and human gesture recognition. These three areas of research share many common aspects and are often confused. Action and activity recognition focuses on recognizing high-level actions or activities performed by humans such as walking, hiking, cycling, eating, lying in a couch, and working or preparing a meal. The result of the recognition is …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient OCR Training Data Generation with Aletheia*

We present how the ground-truthing tool Aletheia can be used to efficiently create training data for an open-source text recognition engine. The labelling process is sped up considerably through a top-down approach. Text content is thereby entered on region level. The characters are then propagated automatically to glyph objects. In addition, segmentation is simplified by several semi-automated...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...

متن کامل

Why Table Ground-Truthing is Hard

The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

متن کامل

Multi-track Annotation of Child Language and Gestures

This paper presents the method and tools applied to the annotation of a corpus of children’s oral and multimodal discourse. The multimodal reality of speech has been long established and is now studied extensively. Linguists and psycholinguists who focus on language acquisition also begin to study child language with a multimodal perspective. In both cases, the annotation of multimodal corpora ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Computer Vision and Image Understanding

دوره 131 شماره

صفحات -

تاریخ انتشار 2015

Gesture recognition corpora and tools: A scripted ground truthing method

نویسندگان

چکیده

منابع مشابه

Efficient OCR Training Data Generation with Aletheia*

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

Why Table Ground-Truthing is Hard

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

Multi-track Annotation of Child Language and Gestures

عنوان ژورنال:

اشتراک گذاری