ZhaohanM/ExplainBind
Explainable Physicochemical Determinants of Protein–Ligand Binding via Non-Covalent Interactions
|
Explainable Physicochemical Determinants of Protein–Ligand Binding via Non-Covalent Interactions |
🔥 News
- [March 2026] ⛳ Our preprint is now available on biorXiv.
- [Feb 2026] 🚀 ExplainBind demo UI is now live on Hugging Face Spaces!
🧩 Overview
ExplainBind is an interaction-aware framework for protein–ligand binding (PLB) prediction.
It supervises token-level cross-attention using non-covalent interaction maps (e.g. hydrogen bonds, salt bridges, hydrophobic contacts, van der Waals, π–π, and cation–π interactions) derived from curated PDB protein–ligand complexes in InteractBind.
By aligning model attention with these physically grounded signals, ExplainBind transforms PLB prediction from a black-box reasoning into an chemistry-grounded process suitable for large-scale screening.
📖 Contents
⚙️ Installation
Tip
Clone this Github repo and set up a new conda environment.
# create a new conda environment
$ conda create --name ExplainBind python=3.9
$ conda activate ExplainBind
# install requried python dependencies
$ pip install -r requirements.txt
# clone the source code of ExplainBind
$ git https://github.com/ZhaohanM/ExplainBind.git
$ cd ExplainBind
Requires: Python ≥ 3.9 and a CUDA-compatible GPU.
⚡ Quick Start
Command-Line Inference
bash run.sh🔬 Foundation Models
🧬 Protein Foundation Models
| Model Name | HuggingFace Link | Input Type |
|---|---|---|
| ESM2 | facebook/esm2_t33_650M_UR50D | Amino Acid Sequence |
| SaProt | westlake-repl/SaProt_650M_AF2 | Structure-aware Sequence |
| SaProt | westlake-repl/SaProt_650M_PDB | Structure-aware sequence |
💊 Molecular Foundation Models
| Model Name | HuggingFace Link | Input Type |
|---|---|---|
| MoLFormer-XL | ibm-research/MoLFormer-XL-both-10pct | SMILES |
| SELFormer | HUBioDataLab/SELFormer | SELFIES |
| SELFIES-TED | ibm-research/materials.selfies-ted | SELFIES |
Note
All foundation models remain frozen. ExplainBind trains the Fusion Module using structure-derived attention map supervision and the Classifier.
🧫 Dataset
We provide 9 benchmarks with true residue–level interaction maps for PLI prediction evaluation. It will release soon!
| Dataset | Type | Example Use |
|---|---|---|
| InteractBind (affinity) | Affinity score splits | Evaluate in-domain |
| InteractBind-P-25%/28%/31%/33% | Protein similarity splits | Evaluate sequence-level generalisation |
| InteractBind-L-08%/35%/40%/59% | Ligand similarity splits | Evaluate sequence-level generalisation |
📚 Acknowledgments
This work was supported in part by National Institutes of Health grants HL155107 and HL166137, and by American Heart Association MERIT award AHA1185447 to JL.
K.Y. acknowledges support from Cancer Research UK (EDDPGM-Nov21/100001, DRCMDP-Nov23/100010 and core funding to the CRUK Scotland Institute (A31287)), BBSRC BB/V016067/1, Prostate Cancer UK MA-TIA22-001 and EU Horizon 2020 grant ID: 101016851.
📜 License
This project is licensed under the MIT License — see the LICENSE file for details.
🧰 Intended Use
ExplainBind is designed to assist computational biologists, AI researchers, and drug-discovery scientists in analysing and explaining molecular interactions.
Applications
- 🔬 Drug Discovery — Identify explainable binding fingerprints between novel compounds and proteins.
- 🧠 Model Explainability — Quantify token-level biological grounding via attention-map supervision.
- 🧪 Cross-Domain Generalisation — Diagnose prediction drop-offs across protein similarity strata.
Important
This framework is intended solely for research purposes and should not be used for clinical decision-making.

