PRIME Tutorial

Overview 

This clustering tutorial is meant for datasets Molecular Dynamics Trajectory. PRIME assumes a MD trajectory that has a well-sampled ensemble of conformations. The PRIME algorithm predicts the native structure of a protein from simulation or clustering data. These methods perfectly mapped all the structural motifs in the studied systems and required unprecedented linear scaling.

Tutorial 

The following tutorial will guide you through the process of determining the native structure of a biomolecule using the PRIME algorithm. If you do not have clustered data. Please refer to other clustering algorithms such as NANI to cluster your data, follow all steps.

1. Clone the MDANCE Repository 

First things first, clone the MDANCE repository if you haven’t already.

$ git clone https://github.com/mqcomplab/MDANCE.git
$ cd MDANCE/scripts/prime

2. Cluster Normalization 

normalize.py With already clustered data, this script will normalize the trajectory data between $[0,1]$ using the Min-Max Normalization.

# System info - EDIT THESE
input_top = '../../examples/md/aligned_tau.pdb'
unnormed_cluster_dir = '../outputs/labels_*'
output_dir = 'normed_clusters'
output_base_name = 'normed_clusttraj'
atomSelection = 'resid 3 to 12 and name N CA C O H'
n_clusters = 6

Inputs

System info

input_top is the topology file used in the clustering.
unnormed_cluster_dir is the directory where the clustering files are located from step 3.
output_dir is the directory where the normalized clustering files will be saved.
output_base_name is the base name for the output files.
atomSelection is the atom selection used in the clustering.
n_clusters is the number of clusters used in the PRIME. If number less than total number of cluster, it will take top n number of clusters.

Execution

Make sure your pwd is $PATH/MDANCE/scripts/prime.

$ python normalize.py

Outputs

normed_clusttraj.c*.npy files, normalized clustering files.

normed_data.npy, appended all normed files together.

3. Similarity Calculations 

prime_sim generates a similarity dictionary from running PRIME.

-h    Help with the argument options.
-m    Methods, {pairwise, union, medoid, outlier} (required).
-n    Number of clusters (required).
-i    Similarity index, {RR or SM} (required).
-t    Fraction of outliers to trim in decimals (default is None).
-w    Weighing clusters by frames it contains (default is True).
-d    Directory where the normed_clusttraj.c*.npy files are located (required)
-s    Location where summary file is located with population of each cluster (required)

Execution

Make sure your pwd is $PATH/MDANCE/scripts/prime.

$ prime_sim -m union -n 6 -i SM -t 0.1  -d normed_clusters -s ../nani/outputs/summary_6.csv

To generate a similarity dictionary using data in normed_clusters (make sure you are in the prime directory) using the union method (2.2 in Fig 2) and Sokal Michener index. In addition, 10% of the outliers were trimmed.

Outputs

w_union_SM_t10.txt file with the similarity dictionary.

The result is a dictionary organized as followes:

{
    "frame_0": [
        0.7,    # cluster 1 similarity.
        0.9,    # cluster 2 similarity.
        ...,
        0.8     # average similarity of all above similarities.
    ]
}

4. Representative Frames 

prime_rep will determine the native structure of the protein using the similarity dictionary generated in step 5.

-h    for help with the argument options.
-m    methods (for one method, None for all methods).
-s    folder to access for w_union_SM_t10.txt file.
-i    similarity index (required)
-t    Fraction of outliers to trim in decimals (default is None).
-d    directory where the normed_clusttraj.c* files are located (required if method is None)

Execution

Make sure your pwd is $PATH/MDANCE/scripts/prime.

$ prime_rep -m union -s outputs -d normed_clusters -t 0.1 -i SM

Outputs

w_rep_SM_t10_union.txt file with the representative frames index.

PRIME Tutorial

Overview

Tutorial

1. Clone the MDANCE Repository

2. Cluster Normalization

Inputs

System info

Execution

Outputs

3. Similarity Calculations

Execution

Outputs

4. Representative Frames

Execution

Outputs

Further Reading

Overview 

Tutorial 

1. Clone the MDANCE Repository 

2. Cluster Normalization 

3. Similarity Calculations 

4. Representative Frames 

Further Reading 