Supporting Information for: Weighted distance functions improve analysis of high-dimensional data: application to molecular dynamics simulations

نویسندگان

  • Nicolas Blöchliger
  • Amedeo Caflisch
چکیده

SAPPHIRE plot for Beta3S (Figure 6 in the main text). We represented the peptide by the sine and cosine values of 99 nonsymmetric dihedral angles. We used a stochastic, approximate algorithm to generate the progress indices for the SAPPHIRE plots in Figure 6. The stochastic algorithm is scalable to large data sets because of the preorganization of the data via tree-based, hierarchical clustering. The upper threshold radius and the tree height for the clustering were set to 1 and 8. The lower threshold radius was set to 0.487, 0.433, and 0.449 for the SAPPHIRE plots based on the UW (eq 1), GW (eq2), and LAW (eq 4) measures, respectively. These settings were chosen to have roughly 100000 clusters at the leaf-level. All the SAPPHIRE plots use snapshot 468441 as the starting snapshot. The number of guesses to find near neighbors was set to 4000. We made use of two recent improvements to the algorithm for generating the approximate progress index (Vitalis, manuscript submitted). First, after the initial clustering of the data, we cluster the data on the three levels of finest resolution again. This improves the homogeneity in the clustering on these levels. The algorithm for generating the approximate progress index requires the computation of near neighbors for the individual snapshots, and the hierarchical clustering is used to focus the search-space. Here, we allow to enlarge this search space if the number of 4000 guesses can otherwise not be satisfied. This is controlled via the CAMPARI keyword “FMCSC_CPROGRDEPTH,” which was set to 3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted Distance Functions Improve Analysis of High-Dimensional Data: Application to Molecular Dynamics Simulations.

Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Mo...

متن کامل

Nonparametric Spectral-Spatial Anomaly Detection

Due to abundant spectral information contained in the hyperspectral images, they are suitable data for anomalous targets detection. The use of spatial features in addition to spectral ones can improve the anomaly detection performance. An anomaly detector, called nonparametric spectral-spatial detector (NSSD), is proposed in this work which utilizes the benefits of spatial features and local st...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

A molecular dynamics simulation of water transport through C and SiC nanotubes: Application for desalination

In this work the conduction of ion-water solution through two discrete bundles of armchair carbon and silicon carbide nanotubes, as useful membranes for water desalination, is studied. In order that studies on different types of nanotubes be comparable, the chiral vectors of C and Si-C nanotubes are selected as (7,7) and (5,5), respectively, so that    a similar volume of fluid is investigated ...

متن کامل

A molecular dynamics simulation of water transport through C and SiC nanotubes: Application for desalination

In this work the conduction of ion-water solution through two discrete bundles of armchair carbon and silicon carbide nanotubes, as useful membranes for water desalination, is studied. In order that studies on different types of nanotubes be comparable, the chiral vectors of C and Si-C nanotubes are selected as (7,7) and (5,5), respectively, so that    a similar volume of fluid is investigated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015