Learning millisecond protein dynamics from what is missing in NMR spectra
Wayment-Steele, H. K., El Nesr, G., Hettiarachchi, R., Kariyawasam, H., Ovchinnikov, S., Kern, D.
bioRxiv·2025
Many proteins biological functions rely on interconversions between multiple conformations occurring at micro-to millisecond ({micro}s-ms) timescales. A lack of standardized, large-scale experimental data has hindered obtaining a more predictive understanding of these motions. After curating >100 Nuclear Magnetic Resonance (NMR) relaxation datasets, we realized an observable for {micro}s-ms motion was hiding in plain sight. Millisecond motions can cause NMR signals to broaden beyond detection, leaving some residues not assigned in the chemical shift datasets of [~]10,000 proteins deposited in the Biological Magnetic Resonance Data Bank (BMRB)1. We made the bold assumption that residues missing assignments are exchange-broadened due to {micro}s-ms motions, and trained various deep learning models to predict missing assignments as markers for such dynamics. Strikingly, these models also predict {micro}s-ms motion directly measured in NMR relaxation experiments. The best of these models, which we named Dyna-1, leverages an intermediate layer of the multimodal language model ESM-32. Notably, dynamics directly linked to biological function -- including enzyme catalysis and ligand binding -- are particularly well predicted by Dyna-1, which parallels our findings that residues with {micro}s-ms motions are highly conserved. We anticipate the datasets and models presented here will be transformative in unlocking the common language of dynamics and function.