Preprocessing of Molecular Dynamics Data

MDANCE provides a set of tools to preprocess molecular dynamics trajectories before clustering. This includes reading the trajectories, normalizing them, and aligning them. This snippet demonstrates how to read a trajectory and save it as a numpy array.

Imports
  • numpy for manipulating and saving arrays.

  • gen_traj_numpy for using the MDAnalysis library to read the trajectories and save them as numpy arrays.

import numpy as np

from mdance import data
from mdance.inputs.preprocess import gen_traj_numpy
Inputs
  • input_top is the path to the topology file. Check here for all accepted formats.

  • input_traj is the path to the trajectory file. Check here for all accepted formats.
    • The trajectory file should be aligned and centered beforehand if needed!

  • output_name is the name of the output file. The output file will be saved as {output_name}.npy for faster loading in the future.

  • atomSelection is the atom selection used for clustering that must be compatible with the MDAnalysis Atom Selections Language.

  • gen_traj_numpy will convert the trajectory to a numpy array with the shape (n_frames, n_atoms * 3) for comparison purposes.

input_top = data.top
input_traj = data.traj
output_base_name = 'backbone'
atomSelection = 'resid 3 to 12 and name N CA C O H'

traj_numpy = gen_traj_numpy(input_top, input_traj, atomSelection)
/home/docs/checkouts/readthedocs.org/user_builds/mdance/envs/latest/lib/python3.10/site-packages/MDAnalysis/topology/PDBParser.py:350: UserWarning: Element information is missing, elements attribute will not be populated. If needed these can be guessed using universe.guess_TopologyAttrs(context='default', to_guess=['elements']).
  warnings.warn("Element information is missing, elements attribute "
/home/docs/checkouts/readthedocs.org/user_builds/mdance/envs/latest/lib/python3.10/site-packages/MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at https://github.com/MDAnalysis/mdanalysis/issues/3889 to learn if this change in behavior might affect you.
  warnings.warn("DCDReader currently makes independent timesteps"
Number of atoms in trajectory: 217
Number of frames in trajectory: 6001
Number of atoms in selection: 50
Outputs
  • The output is a numpy array of shape (n_frames, n_atoms * 3).

output_name = output_base_name + '.npy'
np.save(output_name, traj_numpy)

Total running time of the script: (0 minutes 1.178 seconds)

Gallery generated by Sphinx-Gallery