Systematic Analysis of Biomolecular Conformational Ensembles with PENSA

6 Dec 2022 · Martin Vögele, Neil J. Thomson, Sang T. Truong, Jasper McAvity, Ulrich Zachariae, Ron O. Dror ·

Molecular simulations enable the study of biomolecules and their dynamics on an atomistic scale. A common task is to compare several simulation conditions - like mutations or different ligands - to find significant differences and interrelations between them. However, the large amount of data produced for ever larger and more complex systems often renders it difficult to identify the structural features that are relevant for a particular phenomenon. We present a flexible software package named PENSA that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides a wide variety of featurizations and feature transformations that allow for a complete representation of biomolecules like proteins and nucleic acids, including water and ion cavities within the biomolecular structure, thus avoiding bias that would come with manual selection of features. PENSA implements various methods to systematically compare the distributions of these features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule which allows, e.g., the tracing of information flow to identify signaling pathways. PENSA also comes with convenient tools for loading data and visualizing results in ways that make them quick to process and easy to interpret. PENSA is an open-source Python library maintained at https://github.com/drorlab/pensa along with an example workflow and a tutorial. Here we demonstrate its usefulness in real-world examples by showing how it helps to determine molecular mechanisms efficiently.

PDF Abstract