A New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors

نویسندگان

  • Bono Lui
  • Nenad Trinajstic
  • Sulev Sild
  • Mati Karelson
  • Alan R. Katritzky
چکیده

The selection of the most relevant variable is a frequent problem in the analysis of chemical data, especially now considering the large amounts of data created by the increased computer power and analytical resolution. A novel procedure for variable selection based on multiregression (MR) analysis is developed and applied to the quantitative structure-property relationship (QSPR) modeling of gas chromatographic retention times tR and Dietz response factors RF on 152 diverse chemical compounds. Using 296 descriptors generated by the CODESSA program, “absolutely the best” linear MR models containing from 1 to 5 descriptors were first selected (∼2 × 1010 models were checked), and then “the best” linear stepwise MR models with six and seven descriptors were obtained through “i by i” stepwise selection. In this paper i was varied from 1 to 4, so that in each next step i descriptors were added to the previously selected descriptors. Nonlinear models were developed by the inclusion of cross-products of initial descriptors. We selected as the most important descriptors for tR the number of C-H and C-X bonds, connectivity indices of order 3, the highest normal mode vibrational frequency, and the rotational entropy of the molecule at 300 K. In the case of RF modeling the most important descriptors are those related to the relative number and weight of effective C atoms, the orbital electronic population, and the bond order and valency of C and H atoms. Comparison with the best six-descriptor models obtained by the normal CODESSA procedure shows that nonlinear seven-descriptor MR models now obtained achieve 30% (0.3520 vs 0.5032) and 12% (0.0472 vs 0.0530) less standard errors of estimate for tR and RF, respectively. Our novel procedure of selecting a small number of the most important descriptors from a data set allows us to extract a larger amount of useful information than with the procedure implemented in CODESSA. Thus, our new procedure enables the selection of the best possible MR models from 1010 possibilities. Through the introduction of cross-product terms, we obtained nonlinear MR models which are superior to the corresponding linear models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel Atom-Type-Based Topological Descriptors for Simultaneous Prediction of Gas Chromatographic Retention Indices of Saturated Alcohols on Different Stationary Phases

In this work, novel atom-type-based topological indices, named AT indices, were presented as descriptors to encode structural information of a molecule at the atomic level. The descriptors were successfully used for simultaneous quantitative structure-retention relationship (QSRR) modeling of saturated alcohols on different stationary phases (SE-30, OV-3, OV-7, OV-11, OV-17 and OV-25). At first...

متن کامل

A quantitative structure- property relationship of gas chromatographic/mass spectrometric retention data of 85 volatile organic compounds as air pollutant materials by multivariate methods

A quantitative structure-property relationship (QSPR) study is suggested for the prediction of retention times of volatile organic compounds. Various kinds of molecular descriptors were calculated to represent the molecular structure of compounds. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (...

متن کامل

Application of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives

Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...

متن کامل

Application of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives

Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...

متن کامل

QSRR prediction of the chromatographic retention behavior of painkiller drugs.

Quantitative structure-retention relationship (QSRR) analysis is a useful technique capable of relating chromatographic retention time to the chemical structure of a solute. A QSRR study has been carried out on the reversed-phase high-performance liquid chromatography retention times (log tR) of 62 diverse drugs (painkillers) by using molecular descriptors. Multiple linear regression (MLR) is u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Chemical Information and Computer Sciences

دوره 39  شماره 

صفحات  -

تاریخ انتشار 1999