Explainable predictive modeling for limited spectral data

نویسندگان

چکیده

Feature selection of high-dimensional labeled data with limited observations is critical for making powerful predictive modeling accessible, scalable, and interpretable domain experts. Spectroscopy data, which records the interaction between matter electromagnetic radiation, particularly holds a lot information in single sample. Since acquiring such complex task, it crucial to exploit best analytical tools extract necessary information. In this paper, we investigate most commonly used feature techniques introduce applying recent explainable AI interpret prediction outcomes spectral data. Interpretation outcome beneficial experts as ensures transparency faithfulness ML models knowledge. Due instrument resolution limitations, pinpointing important regions spectroscopy creates pathway optimize collection process through miniaturization spectrometer device. Reducing device size power therefore cost requirement real-world deployment sensor-to-prediction system whole. Furthermore, consider wide range machine learning that have been proven be successful Cetane Number fuels. We specifically design three different scenarios ensure evaluation robust real-time practice developed methodologies uncover hidden effect noise sources on final outcome. The performed both full model reduced using real dataset. Finally, propose correctness metric assess conformance selected subset features expertise. As result, Support Vector Regression yields better accuracy generalization leads less computationally more efficient than Neural Network. More importantly, from original deploying complex, models.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

modeling loss data by phase-type distribution

بیمه گران همیشه بابت خسارات بیمه نامه های تحت پوشش خود نگران بوده و روش هایی را جستجو می کنند که بتوانند داده های خسارات گذشته را با هدف اتخاذ یک تصمیم بهینه مدل بندی نمایند. در این پژوهش توزیع های فیزتایپ در مدل بندی داده های خسارات معرفی شده که شامل استنباط آماری مربوطه و استفاده از الگوریتم em در برآورد پارامترهای توزیع است. در پایان امکان استفاده از این توزیع در مدل بندی داده های گروه بندی ...

Language Modeling for limited-data domains

With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of in-domain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constructed by interpolating component models trained from partially matched corpora. Instead of simple...

متن کامل

Predictive soil mapping with limited sample data

A . X . Z h u a,b,c,d, J . L i u d, F . D u d, S . J . Z h a n g c, C . Z . Q i n c, J . B u r t d, T . B e h r e n s e & T . S c h o l t e n e aSchool of Geography Science, Nanjing Normal University, 1 Wenyuan Road, Nanjing 210023, China, bJiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, 1 Wenyuan Road, Nanjing 210023, China, cState ...

متن کامل

Mining Facebook Data for Predictive Personality Modeling

Beyond being facilitators of human interactions, social networks have become an interesting target of research, providing rich information for studying and modeling user’s behavior. Identification of personality-related indicators encrypted in Facebook profiles and activities are of special concern in our current research efforts. This paper explores the feasibility of modeling user personality...

متن کامل

Data-intensive analytics for predictive modeling

The Data Abstraction Research Group was formed in the early 1990s, to bring focus to the work of the Mathematical Sciences Department in the emerging area of knowledge discovery and data mining (KD & DM). Most activities in this group have been performed in the technical area of predictive modeling, roughly at the intersection of machine learning, statistical modeling, and database technology. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Chemometrics and Intelligent Laboratory Systems

سال: 2022

ISSN: ['1873-3239', '0169-7439']

DOI: https://doi.org/10.1016/j.chemolab.2022.104572