Analyzing Aviation Safety Reports: From Topic Modeling to Scalable Multi-Label Classification

نویسندگان

  • Amrudin Agovic
  • Hanhuai Shan
  • Arindam Banerjee
چکیده

The Aviation Safety Reporting System (ASRS) is used to collect voluntarily submitted aviation safety reports from pilots, controllers and others. As such it is particularly useful in researching aviation safety deficiencies. In this paper we address two challenges related to the analysis of ASRS data: (1) the unsupervised extraction of meaningful and interpretable topics from ASRS reports and (2) multi-label classification of ASRS data based on a set of predefined categories. For topic modeling we investigate the practical usefulness of Latent Dirichlet Allocation (LDA) when it comes to modeling ASRS reports in terms of interpretable topics. We also utilize LDA to generate a more compact representation of ASRS reports to be used in multi-label classification. For multi-label classification we propose a novel and highly scalable multi-label classification algorithm based on multi-variate regression. Empirical results indicate that our approach is superior to several baseline and state-of-the-art approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlated Topics in a Scalable Multidimensional Text Cube: Algorithms and Aviation Safety Case Study

As world-wide air traffic continues to grow even at a modest pace, the overall complexity of the system will increase significantly. This increased complexity can lead to a larger number of fatalities per year even if the extremely low fatality rate that we currently enjoy is maintained. One important source of information about the safety of the aviation system is in Aviation Safety Text Repor...

متن کامل

Using Structural Topic Modeling to Explore Aviation Safety Reporting System Data

The Aviation Safety Reporting System includes over a million confidential reports describing safety incidents. Natural language processing techniques allow for relatively rapid and largely automated analysis of large collections of text data. Meaningful interpretation of the results and further investigations by subject matter experts can follow. This article describes the application of struct...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Cause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction

The Aviation Safety Reporting System collects voluntarily submitted reports on aviation safety incidents to facilitate research work aiming to reduce such incidents. To effectively reduce these incidents, it is vital to accurately identify why these incidents occurred. More precisely, given a set of possible causes, or shaping factors, this task of cause identification involves identifying all ...

متن کامل

Scalable multi-output label prediction: From classifier chains to classifier trellises

Multi-output inference tasks, such as multi-label classification, have become increasingly important in recent years. A popular method for multi-label classification is classifier chains, in which the predictions of individual classifiers are cascaded along a chain, thus taking into account inter-label dependencies and improving the overall performance. Several varieties of classifier chain met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010