ZhaohanM/ExplainBind

	Explainable Physicochemical Determinants of Protein–Ligand Binding via Non-Covalent Interactions

🔥 News

[March 2026] ⛳ Our preprint is now available on biorXiv.
[Feb 2026] 🚀 ExplainBind demo UI is now live on Hugging Face Spaces!

🧩 Overview

ExplainBind is an interaction-aware framework for protein–ligand binding (PLB) prediction.
It supervises token-level cross-attention using non-covalent interaction maps (e.g. hydrogen bonds, salt bridges, hydrophobic contacts, van der Waals, π–π, and cation–π interactions) derived from curated PDB protein–ligand complexes in InteractBind.
By aligning model attention with these physically grounded signals, ExplainBind transforms PLB prediction from a black-box reasoning into an chemistry-grounded process suitable for large-scale screening.

ExplainBind Framework

⚙️ Installation

Tip

Clone this Github repo and set up a new conda environment.

# create a new conda environment
$ conda create --name ExplainBind python=3.9
$ conda activate ExplainBind

# install requried python dependencies
$ pip install -r requirements.txt

# clone the source code of ExplainBind
$ git https://github.com/ZhaohanM/ExplainBind.git
$ cd ExplainBind

Requires: Python ≥ 3.9 and a CUDA-compatible GPU.

⚡ Quick Start

Command-Line Inference

bash run.sh

🔬 Foundation Models

🧬 Protein Foundation Models

Model Name	HuggingFace Link	Input Type
ESM2	facebook/esm2_t33_650M_UR50D	Amino Acid Sequence
SaProt	westlake-repl/SaProt_650M_AF2	Structure-aware Sequence
SaProt	westlake-repl/SaProt_650M_PDB	Structure-aware sequence

💊 Molecular Foundation Models

Model Name	HuggingFace Link	Input Type
MoLFormer-XL	ibm-research/MoLFormer-XL-both-10pct	SMILES
SELFormer	HUBioDataLab/SELFormer	SELFIES
SELFIES-TED	ibm-research/materials.selfies-ted	SELFIES

Note

All foundation models remain frozen. ExplainBind trains the Fusion Module using structure-derived attention map supervision and the Classifier.

🧫 Dataset

We provide 9 benchmarks with true residue–level interaction maps for PLI prediction evaluation. It will release soon!

Dataset	Type	Example Use
InteractBind (affinity)	Affinity score splits	Evaluate in-domain
InteractBind-P-25%/28%/31%/33%	Protein similarity splits	Evaluate sequence-level generalisation
InteractBind-L-08%/35%/40%/59%	Ligand similarity splits	Evaluate sequence-level generalisation

📚 Acknowledgments

This work was supported in part by National Institutes of Health grants HL155107 and HL166137, and by American Heart Association MERIT award AHA1185447 to JL.
K.Y. acknowledges support from Cancer Research UK (EDDPGM-Nov21/100001, DRCMDP-Nov23/100010 and core funding to the CRUK Scotland Institute (A31287)), BBSRC BB/V016067/1, Prostate Cancer UK MA-TIA22-001 and EU Horizon 2020 grant ID: 101016851.

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

🧰 Intended Use

ExplainBind is designed to assist computational biologists, AI researchers, and drug-discovery scientists in analysing and explaining molecular interactions.

Applications

🔬 Drug Discovery — Identify explainable binding fingerprints between novel compounds and proteins.
🧠 Model Explainability — Quantify token-level biological grounding via attention-map supervision.
🧪 Cross-Domain Generalisation — Diagnose prediction drop-offs across protein similarity strata.

Important

This framework is intended solely for research purposes and should not be used for clinical decision-making.