Overview

IDPET is a Python library that can analyze multiple structural ensembles of disordered proteins in parallel. The data can be automatically downloaded from databases like the Protein Ensemble Database (PED) using the package or loaded from local data files. IDPET provides various functions to facilitate the visualization of results through different plots. Using mdtraj as the backend engine, IDPET can read and load multiple datasets as input. Through web APIs, it can directly download, store, and analyze data from both PED and ATLAS, which are two important databases for studying disordered and flexible proteins. Furthermore, by implementing different dimensionality reduction algorithms within the package, a wide variety of analyses can be performed using IDPET.

As an example:

 # import idpet modules for reading, analyzing and visualizing of the IDP ensembles
from idpet.ensemble import Ensemble
from idpet.ensemble_analysis import EnsembleAnalysis
from idpet.visualization import Visualization

There are two possibilities for loading the data:

  • Downloading directly from specified databases: PED & ATLAS

ensembles = [
  Ensemble(code='3a1g_B', database='atlas')
]

ensembles = [
  Ensemble(code='PED00156e001', database='ped'),
  Ensemble(code='PED00157e001', database='ped'),
  Ensemble(code='PED00158e001', database='ped')
]
  • Loading from specified File Paths:

    • using multi-model pdb files

    • using trajectory files such as .dcd or .xtc

    • using a directory with separate pdb files for each model

ensembles = [
  Ensemble(code='PED00156e001', data_path='path/to/data/PED00156e001.pdb'),
  Ensemble(code='PED00157e001', data_path='path/to/data/PED00157e001.dcd', top_path='path/to/data/PED00157e001.top.pdb'),
  Ensemble(code='PED00158e001', data_path='path/to/data/directory_contains_serpate_pdb_for_each_model')]
  • How to visualize the analysis:

 # Create an EnsembleAnalysis object with the given ensembles and specify the output directory for saving the results
 analysis = EnsembleAnalysis(ensembles=ensembles, output_dir='path/to/output_directory')
 # Load the trajectories for each ensemble
 analysis.load_trajectories()

 # Create a Visualization object using the EnsembleAnalysis object
 #to enable visualization of the analysis results
 vis = Visualization(analysis)


# Visualize the distribution of the radius of gyration
vis.radius_of_gyration()

# Visualize the contact probability maps
vis.contact_prob_maps()

# Visualize the comparison matrix between loaded ensembles
vis.comparison_matrix()

Notebooks Overview

Here’s a summary of the example notebooks available in the repository:

comparing_ensembles

Compare multiple conformational ensembles using selected metrics and visualizations.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/comparing_ensembles.ipynb
featurization

Generate numerical features from protein ensembles for downstream analysis.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/featurization.ipynb
kpca_analysis

Perform Kernel PCA to capture non-linear variance in ensemble structures.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/kpca_analysis.ipynb
loading_data

Load and preprocess ensemble data from various formats.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/loading_data.ipynb
pca_analysis

Principal Component Analysis (PCA) for dimensionality reduction and visualization.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/pca_analysis.ipynb
plot_customization

Customize plots for clarity and publication-quality visualizations.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/plot_customization.ipynb
sh3_example

Case study: global and local analysis of the SH3 domain of the Drkn protein.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/sh3_example.ipynb
tsne_analysis

t-SNE embedding of ensemble features to explore local structure.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/tsne_analysis.ipynb
umap_analysis

UMAP embedding for global manifold learning and visualization.

https://github.com/BioComputingUP/EnsembleTools/blob/main/notebooks/umap_analysis.ipynb