Protein crystallization image classification with elastic net

نویسندگان

  • Jeffrey Hung
  • John Collins
  • Mehari Weldetsion
  • Oliver Newland
  • Eric Chiang
  • Steve Guerrero
  • Kazunori Okada
چکیده

Protein crystallization plays a crucial role in pharmaceutical research by supporting the investigation of a protein’s molecular structure through X-ray diffraction of its crystal. Due to the rare occurrence of crystals, images must be manually inspected, a laborious process. We develop a solution incorporating a regularized, logistic regression model for automatically evaluating these images. Standard image features, such as shape context, Gabor filters and Fourier transforms, are first extracted to represent the heterogeneous appearance of our images. Then the proposed solution utilizes Elastic Net to select relevant features. Its L-regularization mitigates the effects of our large dataset, and its Lregularization ensures proper operation when the feature number exceeds the sample number. A two-tier cascade classifier based on naïve Bayes and random forest algorithms categorized the images. In order to validate the proposed method, we experimentally compare it with naïve Bayes, linear discriminant analysis, random forest, and their two-tier cascade classifiers, by 10-fold cross validation. Our experimental results demonstrate a 3-category accuracy of 74%, outperforming other models. In addition, Elastic Net better reduces the false negatives responsible for a high, domain specific risk. To the best of our knowledge, this is the first attempt to apply Elastic Net to classifying protein crystallization images. Performance measured on a large pharmaceutical dataset also fared well in comparison with those presented in the previous studies, while the reduction of the high-risk false negatives is promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Support Matrix Machines

In many classification problems such as electroencephalogram (EEG) classification and image classification, the input features are naturally represented as matrices rather than vectors or scalars. In general, the structure information of the original feature matrix is useful and informative for data analysis tasks such as classification. One typical structure information is the correlation betw...

متن کامل

Learning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis

Obtaining a protein’s 3D structure is crucial to the understanding of its functions and interactions with other proteins. It is critical to accelerate the protein crystallization process with improved accuracy for understanding cancer and designing drugs. Systematic high-throughput approaches in protein crystallization have been widely applied, generating a large number of protein crystallizati...

متن کامل

Surface effects in the crystallization process of elastic flexible polymers

Investigating thermodynamic properties of liquid–solid transitions of flexible homopolymers with elastic bonds by means of multicanonical Monte Carlo simulations, we find crystalline conformations that resemble ground-state structures of Lennard-Jones clusters. This allows us to set up a structural classification scheme for finite-length flexible polymers and their freezing mechanism in analogy...

متن کامل

The structured elastic net for quantile regression and support vector classification

In view of its ongoing importance for a variety of practical applications, feature selection via `1-regularization methods like the lasso has been subject to extensive theoretical as well empirical investigations. Despite its popularity, mere `1-regularization has been criticized for being inadequate or ineffective, notably in situations in which additional structural knowledge about the predic...

متن کامل

Non-parametric Image Registration Using Generalized Elastic Nets

We introduce a novel approach for non-parametric non-rigid image registration using generalized elastic nets. The concept behind the algorithm is to adapt an elastic net in spatial-intensity space of one image to fit the second image. The resulting configuration of the net, when it achieves its minimum energy state, directly represents correspondence between images in a probabilistic sense and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014