Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem
نویسندگان
چکیده
The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski's bioavailability, Oprea's lead-likeness and Congreve's fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at www.gdb.unibe.ch ) gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.
منابع مشابه
Visualisation and subsets of the chemical universe database GDB-13 for virtual screening
The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two hundred and fifty-five subsets of GDB-13...
متن کاملExpanding the fragrance chemical space for virtual screening
The properties of fragrance molecules in the public databases SuperScent and Flavornet were analyzed to define a "fragrance-like" (FL) property range (Heavy Atom Count ≤ 21, only C, H, O, S, (O + S) ≤ 3, Hydrogen Bond Donor ≤ 1) and the corresponding chemical space including FL molecules from PubChem (NIH repository of molecules), ChEMBL (bioactive molecules), ZINC (drug-like molecules), and GD...
متن کاملEnumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have...
متن کاملQSAR-assisted virtual screening of lead-like molecules from marine and microbial natural sources for antitumor and antibiotic drug discovery.
A Quantitative Structure-Activity Relationship (QSAR) approach for classification was used for the prediction of compounds as active/inactive relatively to overall biological activity, antitumor and antibiotic activities using a data set of 1746 compounds from PubChem with empirical CDK descriptors and semi-empirical quantum-chemical descriptors. A data set of 183 active pharmaceutical ingredie...
متن کاملMMsINC®: A New Public Large-Scale Chemoinformatics Database System
MMSinc is a database of commercially available compounds. It currently contains over 4 million /non-redundant/ chemical compounds in 3D format. The whole database was studied in term of uniqueness, diversity, frameworks, chemical reactivity, drug-like and lead-like properties. There are more than 175.000 frameworks in our database. There are 3.89 millions (98%) of drug-like molecules among whic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of computer-aided molecular design
دوره 25 7 شماره
صفحات -
تاریخ انتشار 2011