Protein crystallization image classification with elastic net
نویسندگان
چکیده
Protein crystallization plays a crucial role in pharmaceutical research by supporting the investigation of a protein’s molecular structure through X-ray diffraction of its crystal. Due to the rare occurrence of crystals, images must be manually inspected, a laborious process. We develop a solution incorporating a regularized, logistic regression model for automatically evaluating these images. Standard image features, such as shape context, Gabor filters and Fourier transforms, are first extracted to represent the heterogeneous appearance of our images. Then the proposed solution utilizes Elastic Net to select relevant features. Its L-regularization mitigates the effects of our large dataset, and its Lregularization ensures proper operation when the feature number exceeds the sample number. A two-tier cascade classifier based on naïve Bayes and random forest algorithms categorized the images. In order to validate the proposed method, we experimentally compare it with naïve Bayes, linear discriminant analysis, random forest, and their two-tier cascade classifiers, by 10-fold cross validation. Our experimental results demonstrate a 3-category accuracy of 74%, outperforming other models. In addition, Elastic Net better reduces the false negatives responsible for a high, domain specific risk. To the best of our knowledge, this is the first attempt to apply Elastic Net to classifying protein crystallization images. Performance measured on a large pharmaceutical dataset also fared well in comparison with those presented in the previous studies, while the reduction of the high-risk false negatives is promising.
منابع مشابه
Support Matrix Machines
In many classification problems such as electroencephalogram (EEG) classification and image classification, the input features are naturally represented as matrices rather than vectors or scalars. In general, the structure information of the original feature matrix is useful and informative for data analysis tasks such as classification. One typical structure information is the correlation betw...
متن کاملLearning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis
Obtaining a protein’s 3D structure is crucial to the understanding of its functions and interactions with other proteins. It is critical to accelerate the protein crystallization process with improved accuracy for understanding cancer and designing drugs. Systematic high-throughput approaches in protein crystallization have been widely applied, generating a large number of protein crystallizati...
متن کاملSurface effects in the crystallization process of elastic flexible polymers
Investigating thermodynamic properties of liquid–solid transitions of flexible homopolymers with elastic bonds by means of multicanonical Monte Carlo simulations, we find crystalline conformations that resemble ground-state structures of Lennard-Jones clusters. This allows us to set up a structural classification scheme for finite-length flexible polymers and their freezing mechanism in analogy...
متن کاملThe structured elastic net for quantile regression and support vector classification
In view of its ongoing importance for a variety of practical applications, feature selection via `1-regularization methods like the lasso has been subject to extensive theoretical as well empirical investigations. Despite its popularity, mere `1-regularization has been criticized for being inadequate or ineffective, notably in situations in which additional structural knowledge about the predic...
متن کاملNon-parametric Image Registration Using Generalized Elastic Nets
We introduce a novel approach for non-parametric non-rigid image registration using generalized elastic nets. The concept behind the algorithm is to adapt an elastic net in spatial-intensity space of one image to fit the second image. The resulting configuration of the net, when it achieves its minimum energy state, directly represents correspondence between images in a probabilistic sense and ...
متن کامل