.. raw:: html

PRIME Tutorial ============== .. contents:: :local: :depth: 2 Overview -------- This clustering tutorial is meant for datasets Molecular Dynamics Trajectory. PRIME assumes a MD trajectory that has a well-sampled ensemble of conformations. The PRIME algorithm predicts the native structure of a protein from simulation or clustering data. These methods perfectly mapped all the structural motifs in the studied systems and required unprecedented linear scaling. Tutorial -------- The following tutorial will guide you through the process of determining the native structure of a biomolecule using the PRIME algorithm. If you do not have clustered data. Please refer to other clustering algorithms such as `NANI `__ to cluster your data, follow all steps. 1. Clone the MDANCE Repository ~~~~~~~~~~~~~~~~~~~~~~~ First things first, clone the MDANCE repository if you haven't already. .. code:: bash $ git clone https://github.com/mqcomplab/MDANCE.git $ cd MDANCE/scripts/prime 2. Cluster Normalization ~~~~~~~~~~~~~~~~~~~~~~~~ `normalize.py `__ With already clustered data, this script will normalize the trajectory data between :math:`[0,1]` using the Min-Max Normalization. :: # System info - EDIT THESE input_top = '../../examples/md/aligned_tau.pdb' unnormed_cluster_dir = '../outputs/labels_*' output_dir = 'normed_clusters' output_base_name = 'normed_clusttraj' atomSelection = 'resid 3 to 12 and name N CA C O H' n_clusters = 6 Inputs ^^^^^^ System info ''''''''''' | ``input_top`` is the topology file used in the clustering. | ``unnormed_cluster_dir`` is the directory where the clustering files are located from step 3. | ``output_dir`` is the directory where the normalized clustering files will be saved. | ``output_base_name`` is the base name for the output files. | ``atomSelection`` is the atom selection used in the clustering. | ``n_clusters`` is the number of clusters used in the PRIME. If number less than total number of cluster, it will take top *n* number of clusters. Execution ^^^^^^^^^ Make sure your pwd is ``$PATH/MDANCE/scripts/prime``. .. code:: bash $ python normalize.py Outputs ^^^^^^^ | ``normed_clusttraj.c*.npy`` files, normalized clustering files. | ``normed_data.npy``, appended all normed files together. 3. Similarity Calculations ~~~~~~~~~~~~~~~~~~~~~~~~~~ ``prime_sim`` generates a similarity dictionary from running PRIME. | ``-h`` Help with the argument options. | ``-m`` Methods, {pairwise, union, medoid, outlier} (*required*). | ``-n`` Number of clusters (*required*). | ``-i`` Similarity index, {RR or SM} (*required*). | ``-t`` Fraction of outliers to trim in decimals (default is None). | ``-w`` Weighing clusters by frames it contains (default is True). | ``-d`` Directory where the ``normed_clusttraj.c*.npy`` files are located (*required*) | ``-s`` Location where ``summary`` file is located with population of each cluster (*required*) Execution ^^^^^^^^^ Make sure your pwd is ``$PATH/MDANCE/scripts/prime``. .. code:: bash $ prime_sim -m union -n 6 -i SM -t 0.1 -d normed_clusters -s ../nani/outputs/summary_6.csv To generate a similarity dictionary using data in `normed_clusters `__ (make sure you are in the prime directory) using the ``union`` method (2.2 in *Fig 2*) and Sokal Michener index. In addition, 10% of the outliers were trimmed. .. _outputs-1: Outputs ^^^^^^^ | ``w_union_SM_t10.txt`` file with the similarity dictionary. | The result is a dictionary organized as followes: .. code:: plaintext { "frame_0": [ 0.7, # cluster 1 similarity. 0.9, # cluster 2 similarity. ..., 0.8 # average similarity of all above similarities. ] } 4. Representative Frames ~~~~~~~~~~~~~~~~~~~~~~~~ ``prime_rep`` will determine the native structure of the protein using the similarity dictionary generated in step 5. | ``-h`` for help with the argument options. | ``-m`` methods (for one method, None for all methods). | ``-s`` folder to access for ``w_union_SM_t10.txt`` file. | ``-i`` similarity index (*required*) | ``-t`` Fraction of outliers to trim in decimals (default is None). | ``-d`` directory where the ``normed_clusttraj.c*`` files are located (required if method is None) .. _example-1: Execution ^^^^^^^^^ Make sure your pwd is ``$PATH/MDANCE/scripts/prime``. .. code:: bash $ prime_rep -m union -s outputs -d normed_clusters -t 0.1 -i SM .. _outputs-2: Outputs ^^^^^^^ ``w_rep_SM_t10_union.txt`` file with the representative frames index. Further Reading --------------- For more information on the PRIME algorithm, please refer to the `PRIME paper `__. Please Cite .. code:: bibtex @article{chen_protein_2024, title = {Protein Retrieval via Integrative Molecular Ensembles (PRIME) through Extended Similarity Indices}, issn = {1549-9618}, url = {https://doi.org/10.1021/acs.jctc.4c00362}, doi = {10.1021/acs.jctc.4c00362}, journal = {Journal of Chemical Theory and Computation}, author = {Chen, Lexin and Mondal, Arup and Perez, Alberto and Miranda-Quintana, Ramón Alain}, month = jul, year = {2024}, note = {Publisher: American Chemical Society}, } .. image:: ../img/methods.jpg :width: 500 :alt: Alternative text *Fig 2. Six techniques of protein refinement. Blue is top cluster.*