taka78/ultidock
Ultidock: An Open-Source Docking Pipeline with GPU Acceleration, Automated Cavity Detection, and Molecular Dynamics Integration
Ultidock (gmx-dev branch)
Ultidock is a high-throughput molecular docking workflow that automates ligand
staging, grid preparation, AutoDock-GPU execution, and post-processing. The
gmx-dev branch focuses on reproducible automation today and prepares the
groundwork for future GROMACS-based molecular dynamics integration.
This document explains how to run the pipeline step by step, details the
major components, and highlights the features that make Ultidock different from
traditional docking scripts.
Table of Contents
- Requirements
- Repository Layout
- Quick Start: End-to-End Run
- Command Reference
- Configuration Reference
- Pipeline Segments & What Makes Ultidock Different
- Spotlight: Grid Boxing & Cavity Finder Algorithm
- Working with the Example Pipelines
- Troubleshooting
- Citation & License
Requirements
Ultidock targets modern Linux systems. Windows and macOS users should rely on a
Linux container or VM.
Hardware
| Component | Requirement |
|---|---|
| CPU | x86-64 with AVX (for preprocessing and optional CPU docking) |
| GPU | NVIDIA GPU with CUDA capability 7.0 or newer (Ampere, Ada, Hopper, or RTX 40/50). CPU-only mode is supported but slower. |
| RAM | ≥ 16 GB recommended for large ligand batches |
| Storage | ≥ 20 GB free space for ligand archives, grids, and outputs |
Operating System
- Ubuntu 22.04+, Debian 12+, Fedora 39+, or a comparable modern Linux distro
- Bash shell and coreutils available on
$PATH
System Packages
Install the build toolchain and helper utilities once:
sudo apt update && sudo apt install -y \
automake autoconf libtool m4 perl pkg-config\
build-essential gcc g++ gfortran make cmake \
unzip tar csh wget git \
libstdc++-dev libx11-dev libncurses-dev \
python3 python3-venv python3-pipTip: Replace
aptcommands with the equivalent package manager commands
for your distribution.
GPU Runtimes
- Latest available CUDA Toolkit for your hardware is required for NVIDIA GPU execution. Install
it from NVIDIA's official downloads. - Ultidock defaults to AutoDock-GPU. AutoGrid will also be compiled on first run.
Python Environment
Ultidock requires Python 3.10+. A virtual environment is recommended:
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pipInstall all required packages and the ultidock CLI in one step:
pip install -r requirements.txt
pip install -e .This installs numpy, scipy, psutil, matplotlib, pandas and the molguard
I/O hardening layer (exposed as the ultidock command).
pandas and matplotlib are used only by the post-run analysis stage.
Repository Layout
ultidock/
├─ pyproject.toml # Package manifest; defines the `ultidock` CLI entry-point
├─ requirements.txt # Runtime dependencies (numpy, scipy, click, ...)
├─ requirements-dev.txt # Development/test dependencies
├─ SETUP.md # Step-by-step new-user guide
├─ molguard/ # I/O hardening + validation layer
│ ├─ io/
│ │ ├─ fixedfmt.py # Fixed-width, locale-safe float formatter
│ │ └─ pdbqt.py # PDBQT linter, normalizer, receptor canonicalizer
│ ├─ grids/
│ │ └─ check.py # AutoGrid .fld / .map sanity checker
│ ├─ cli.py # `ultidock` command-line interface
│ └─ tests/ # Unit and regression tests (pytest)
├─ docking/
│ ├─ run.py # Main entry point for the entire pipeline
│ ├─ setup.py # Idempotent environment + dependency setup
│ ├─ dock_v02.py # AutoDock-GPU / AutoGrid orchestration
│ ├─ analyse_docking_results.py
│ ├─ extract.py # Front-end for AutoDock Vina's vina_split utility
│ ├─ clean.py # Resets compiled binaries and outputs
│ ├─ ligands.wget # Example ligand download manifest
│ ├─ MACRO_MOL_DIR/ # Receptors, grids, and generated sites
│ ├─ LIGANDS_DIR/ # Archived ligands and split PDBQT files
│ ├─ DOCKING_DIR/ # AutoDock-GPU/Vina output poses
│ ├─ AUTODOCK_GPU_DIR/ # Compiled AutoDock-GPU + AutoGrid binaries
│ ├─ VINA_DIR/ # AutoDock Vina binaries
│ ├─ ANALYSIS_DIR/ # Intermediate scoring/aggregation artifacts
│ └─ RESULTS_DIR/ # Final CSV/JSON summaries
├─ examples/ # Self-contained example runners
└─ data-analyses/, results/ # Optional downstream notebooks & exports
The workflow assumes you copy or generate receptor .pdbqt files inside
docking/MACRO_MOL_DIR/ and provide a .wget manifest (or existing ligand
archives) inside docking/LIGANDS_DIR/.
Quick Start: End-to-End Run
Follow this checklist whenever you want to run Ultidock from a clean workspace.
-
Clone the repository (or update your local copy):
git clone https://github.com/taka78/ultidock.git cd ultidock -
Activate your Python environment and install the requirements (see
Requirements). -
Reset the docking workspace to avoid stale binaries and outputs:
python3 docking/clean.py -y --all
-
Stage inputs:
- Copy your receptor(s) to
docking/MACRO_MOL_DIR/. Each receptor can live in
its own subdirectory if you plan to run multi-site docking. - Provide ligands via one of the following:
- Populate
docking/ligands.wgetwith direct links to.pdbqt.gzarchives
(one per line). Ultidock will download, verify, and extract them. - Manually place
.pdbqtor.pdbqt.gzfiles indocking/LIGANDS_DIR/. - Pass
--skip-wgetwhen runningsetup.py/run.pyto skip downloads and
rely entirely on pre-populated ligand files.
- Populate
- Copy your receptor(s) to
-
Run the setup + docking pipeline:
python3 docking/run.py --mode gpu
- Use
--mode cputo skip AutoDock-GPU compilation and rely on AutoGrid- Vina.
- Add
--skip-wgetif your ligands are already staged and you want to avoid
executing the download manifest. - Override directories as needed with
--LIGANDS_DIR,--MACRO_MOL_DIR, etc.
Absolute paths are recommended for scripted automation.
- Use
-
Monitor progress:
- Setup output reports where AutoDock-GPU, AutoGrid, and Vina binaries are
compiled or reused. - Docking output prints the number of ligands discovered, grid preparation
steps, worker launches, and database insertions.
- Setup output reports where AutoDock-GPU, AutoGrid, and Vina binaries are
-
Review results:
- Raw poses are written to
docking/DOCKING_DIR/. - Per-receptor metadata (centers, grids) lives in
docking/MACRO_MOL_DIR/. - An auto-managed SQLite database (
docking/RESULTS_DIR/ultidock_results.db)
is updated throughout the run for incremental result parsing and can be
inspected or queried at any time. - If
pandasis installed, aggregated CSV/JSON summaries will be produced in
docking/RESULTS_DIR/.
- Raw poses are written to
-
Optional post-run steps:
- Run
python3 docking/extract.py --helpto (re)split ligand archives via
AutoDock Vina'svina_splitutility or prepare filtered subsets for
downstream MD. - Use the notebooks in
data-analyses/for visualization or scoring audits.
- Run
Repeat steps 3–8 for each new batch to ensure deterministic runs.
Command Reference
| Command | Purpose |
|---|---|
python3 docking/run.py [options] |
Primary entry point. Validates the environment, runs setup, downloads ligands, launches docking, and triggers analysis. |
python3 docking/setup.py [options] |
Runs the setup stage only (directory creation, AutoDock-GPU/AutoGrid/Vina checks). All CLI flags mirror run.py. |
python3 docking/dock_v02.py [options] |
Executes the docking stage against prepared ligands and receptors. Used internally by run.py. |
python3 docking/extract.py |
Wrapper around AutoDock Vina's vina_split for splitting ligand archives and optional filtering. |
python3 docking/clean.py -y --all |
Removes compiled binaries, cached grids, downloads, and generated configs. Use before starting a fresh run. |
ultidock run [options] |
Run the full docking pipeline from anywhere in the repo (no need to cd docking/). Forwards all flags to docking/run.py. |
ultidock clean [-y] [--all] |
Reset compiled binaries and outputs. Forwards all flags to docking/clean.py. |
ultidock pdbqt check <file> |
Lint a receptor or ligand PDBQT for AutoDock column-format issues (exponent notation, missing decimals, bad atom types). |
ultidock pdbqt normalize <file> -o <out> |
Rewrite all numeric columns in a ligand PDBQT through the fixed-width formatter. Torsion tree is left untouched. |
ultidock pdbqt canonicalize-receptor <file> -o <out> |
Sort, renumber, and reformat a receptor PDBQT deterministically. Returns a SHA-256 digest for reproducibility checks. |
ultidock grids check <maps.fld> |
Validate AutoGrid output: checks for all-zero maps, NaN/Inf energies, missing files, and atom-type mismatches. |
ultidock doctor |
Print tool locations and versions. Distinguishes between binaries not compiled yet (source present) and not found at all. |
Key run.py Flags
| Flag | Description |
|---|---|
--mode {gpu,cpu} |
Select GPU (AutoDock-GPU) or CPU-only (Vina) execution mode. |
--skip-setup |
Assume setup has already been run and use the existing config. |
--LIGANDS_DIR PATH |
Override ligand staging directory. |
--MACRO_MOL_DIR PATH |
Override receptor directory. |
--AUTODOCK_GPU_DIR PATH |
Override AutoDock-GPU build/install directory. |
--VINA_DIR PATH |
Override AutoDock Vina install directory. |
--RESULTS_DIR PATH, --ANALYSIS_DIR PATH, --DOCKING_DIR PATH |
Customize other pipeline locations. |
--wget FILE |
Use a custom .wget manifest for ligand downloads. |
--skip-wget |
Skip executing wget commands even if a manifest is present. |
All flags are optional; defaults point to directories within docking/.
Configuration Reference
Running python3 docking/run.py or python3 docking/setup.py writes a fully
resolved configuration to docking/config.py. The file records the exact
directories, binaries, and grid parameters that Ultidock will reuse on the next
invocation. Edit the file directly (or pass CLI overrides) to fine-tune a run.
Directory Layout Variables
| Variable | Meaning |
|---|---|
LIGANDS_DIR |
Absolute path where ligand archives and split PDBQT files are staged. |
DOCKING_DIR |
Output directory for AutoDock-GPU / Vina poses and logs. |
ANALYSIS_DIR |
Workspace for intermediate scoring, per-ligand summaries, and temporary exports. |
VINA_DIR |
Location of the AutoDock Vina binaries used for ligand splitting or CPU docking. |
AUTODOCK_GPU_DIR |
Location of the AutoDock-GPU and AutoGrid toolchains compiled during setup. |
MACRO_MOL_DIR |
Root folder for receptor structures, generated grids, and per-site artifacts. |
RESULTS_DIR |
Destination for final CSV/JSON exports and the SQLite results database. |
DB_PATH |
Full path to the SQLite database (ultidock_results.db) that receives live docking updates. |
Runtime Controls
| Variable | Description |
|---|---|
GPU_TYPE |
Which accelerator build to prepare (CPU, CUDA, or OCL). In CPU mode only AutoGrid and Vina are compiled. |
NUMWI |
Number of AutoDock-GPU work items queued per ligand batch. Increase to better saturate large GPUs; reduce on memory-constrained devices. |
AUTO_GRID_BIN |
Resolved path to the autogrid4 binary. Adjust if you provide a prebuilt AutoGrid installation. |
GRID_MODE |
Strategy for identifying grid centers: ligand, residues, centers (hotspot-driven default), or blind (whole-protein). |
GRID_SPACING |
Ångström spacing between grid points. Smaller values yield finer resolution at the cost of longer AutoGrid runtimes. |
GRID_MARGIN |
Extra Ångström padding applied to each hotspot-derived grid to ensure the box fully encloses the binding site. |
GRID_CAP |
Maximum Å-length per axis when running in blind mode to prevent runaway grid sizes. |
CENTERS_TSV |
Optional path to a precomputed centers.tsv. Leave as None to let Ultidock regenerate hotspot centers automatically. |
REF_LIGAND_PDB |
Reference ligand file used when GRID_MODE="ligand" to seed the search box from a co-crystal pose. |
Grid Boxing & Cavity Finder Dials
These parameters feed the hotspot detection routine acknowledged in the
Spotlight section.
| Variable | Description |
|---|---|
HOTSPOT_NMS_MINSEP_A |
Minimum Å separation between detected hotspots when applying non-maximum suppression. Prevents duplicate centers in dense regions. |
R_MIN_CAVITY_A |
Minimum inscribed sphere radius (Å) required for a cavity to be considered viable. Filters out shallow surface pockets. |
SURFACE_SHELL__MIN_A / SURFACE_SHELL__MAX_A |
Inner/outer Å bounds for the surface shell used to classify near-surface voxels. |
SURFACE_NMS_MINSEP_A |
Non-maximum suppression distance (in voxels) when evaluating surface cavities. Larger values merge nearby openings. |
MAX_CENTER_DIST_A |
Å-distance threshold from the protein surface for accepting automatically detected centers. |
CONTACT_SHELL_A |
Thickness of the contact shell (in voxels ≈ Å) counted when evaluating pocket accessibility. |
HOTSPOT_BOX_ANGLE |
Minimum side length (Å) for the automatically generated search box, ensuring consistent grid volumes even for narrow cavities. |
MIN_SURFACE_FRAC |
Minimum fraction of grid voxels that must belong to the surface shell for a box to qualify as a surface pocket. |
AUTOSITES |
Target number of hotspots (grid boxes) to generate per receptor when running in automatic centers mode. |
Tweak these parameters only when you need to bias the hotspot finder—for
example, tightening MIN_SURFACE_FRAC to focus on buried cavities or lowering
AUTOSITES to restrict the number of generated docking boxes.
Note: Values expressed in voxels (e.g.,
SURFACE_NMS_MINSEP_Aand
CONTACT_SHELL_A) can be converted to Ångström by multiplying by
GRID_SPACING.
Pipeline Segments & What Makes Ultidock Different
Ultidock is organized into four primary segments. Each segment has been
engineered for reliability and reproducibility compared to ad-hoc docking
scripts.
-
Setup (
setup.py)- Idempotently creates the full directory tree (LIGANDS, MACRO_MOL, DOCKING,
RESULTS, etc.). - Detects GPU availability and compiles AutoDock-GPU/AutoGrid with the
correct compute capabilities. - Respects explicit CLI paths so scripted runs can reuse shared toolchains.
- Idempotently creates the full directory tree (LIGANDS, MACRO_MOL, DOCKING,
-
Ligand Preparation
ligands.wgetentries are executed with robust retry logic and optional
HTTPS upgrades (HSTS aware) unless--skip-wgetis specified, in which case
pre-seeded ligand archives are used as-is.extract.pyorchestrates AutoDock Vina'svina_splitto extract, split,
and stage ligands with deterministic filenames so downstream consumers can
glob without guessing naming schemes.
-
Docking (
dock_v02.py)- Per-receptor grid caching eliminates redundant AutoGrid runs even when the
pipeline is restarted. - Semaphore-guarded worker pool maintains one AutoDock-GPU process per GPU
while CPU preparation remains concurrent. - Metadata (grid centers, hotspots, cavity statistics) is persisted for MD
seeding and reproducibility.
- Per-receptor grid caching eliminates redundant AutoGrid runs even when the
-
I/O Hardening (
molguard)- Fixed-width float formatter (
fixedfmt.py) ensures every number written to
AutoGrid/AutoDock files respects the Fortran-style column widths those tools
parse — no exponent notation, no missing decimals, no locale drift. - PDBQT linter and normalizer catches column-format bugs before they reach
AutoGrid, with fail-slow error collection and a regression test suite. - Receptor canonicalizer produces deterministic, byte-identical files across
machines given the same input, simplifying reproducibility audits. - Grid map checker validates AutoGrid output immediately after each run:
all-zero maps, NaN/Inf energies, and missing files are caught with
actionable error messages pointing to the.glglog.
- Fixed-width float formatter (
-
Analysis (
analyse_docking_results.py)- Optional stage that aggregates top poses, binding energies, and summary
statistics. Ifpandasis unavailable the pipeline logs a warning and
continues so production runs are never blocked by optional tooling. - Results are parsed directly from the automatically maintained SQLite
database so reruns can resume and analytics scripts can attach without
bespoke exports.
- Optional stage that aggregates top poses, binding energies, and summary
- Single-command automation:
run.pyorchestrates everything from toolchain
compilation to final scoring, eliminating manual multi-step checklists. - Directory-first design: explicit, user-configurable directories keep
receptors, ligands, grids, and results isolated and reproducible. - Deterministic I/O: the
molguardlayer guarantees that the same input
always produces byte-identical PDBQT and grid files regardless of machine or
locale, enabling reliable comparative studies. - Example-driven: the
examples/directory demonstrates full CPU and GPU
runs, including workspace reset, staging, and pipeline invocation. - Resilient defaults: built-in fallbacks for missing optional dependencies
(e.g., pandas, wget SSL issues) keep long batches running with informative
warnings. - Database-native: every docking job streams its status into the SQLite
results store, enabling instant post-processing without manual log parsing. - Future-ready: the branch maintains alignment with planned GROMACS
integration by preserving metadata required for MD restarts and analysis.
Spotlight: Grid Boxing & Cavity Finder Algorithm
Ultidock proudly features a high-precision grid boxing and cavity finder
algorithm that automatically identifies docking hotspots, sizes grids to the
appropriate search volume, and surfaces cavity statistics for every receptor.
This algorithm—implemented in docking/dock_v02.py—is a
cornerstone of what sets Ultidock apart. Special acknowledgement goes to the
original contributors who engineered the routine: their work enables the
pipeline to deliver reproducible, multi-site docking without manual box tuning.
- Multi-scale hotspot detection guards against missed binding pockets even on
flexible receptors. - Adaptive bounding boxes trim unneeded search space, speeding up AutoGrid and
AutoDock-GPU runs while preserving accuracy. - Persisted cavity metadata (center coordinates, occupancy metrics, and grid
spacing) feeds directly into MD seeding and downstream analysis.
Whenever you run Ultidock, this algorithm silently prepares precise search
volumes so that the subsequent docking stages focus on the most promising
regions.
Working with the Example Pipelines
Two curated examples (gabaa-benzos and sert-escitalopram) showcase the full
workflow. Each example runner performs the same steps a user would follow:
python3 examples/sert-escitalopram/example-run.pyWhat the helper (examples/common.py) does:
- Calls
python3 docking/clean.py -y --allto ensure a fresh workspace. - Recreates the canonical directories under
docking/. - Copies the example receptor and ligands into the main pipeline directories.
- Executes
python3 docking/run.pywith explicit path overrides.
Use these scripts as blueprints for your own automation or CI workflows.
Troubleshooting
-
SSL errors while downloading ligands
- Corporate firewalls or strict TLS inspection can block
files.docking.org.
Download the required archives manually and place them in
docking/LIGANDS_DIR/before running the pipeline.
- Corporate firewalls or strict TLS inspection can block
-
AutoDock-GPU compilation failures
- Ensure CUDA 12.8+ is installed and
nvcc --versionreports the expected
toolkit. Re-runpython3 docking/clean.py -y --allfollowed by
python3 docking/run.py --mode gpu.
- Ensure CUDA 12.8+ is installed and
-
ultidock doctorshows[WARN] not compiledfor AutoGrid or AutoDock-GPU- The source tree is present but the binaries have not been built yet.
Runcd docking && python setup.pyto compile them. After a successful
build,doctorwill report[OK]with the resolved binary path.
- The source tree is present but the binaries have not been built yet.
-
ultidock pdbqt checkreportsNO_DECIMALorEXPONENTerrors- These indicate the PDBQT file was written by a tool that does not respect
AutoDock column widths (e.g., some OpenBabel versions or AMBER converters).
Runultidock pdbqt normalize <file> -o <fixed.pdbqt>to reformat the
numeric columns before docking.
- These indicate the PDBQT file was written by a tool that does not respect
-
Optional analysis skipped
- If you see
ModuleNotFoundError: pandas, install it with
pip install pandasand re-run the analysis stage:
python3 docking/analyse_docking_results.py.
- If you see
-
Out-of-disk-space errors
- Ligand archives can be large. Clean up with
python3 docking/clean.py -yor
remove unused files fromdocking/LIGANDS_DIR/anddocking/DOCKING_DIR/.
- Ligand archives can be large. Clean up with
Citation & License
If you use Ultidock in academic or industrial research, please cite:
Turgut, T. (2025). Ultidock: A Lightweight Parallelized Docking Pipeline for
Ligand Screening. GitHub Repository. https://github.com/taka78/ultidock
Ultidock is released under the MIT License. When applicable, please
also cite:
- Trott, O., & Olson, A. J. (2010). AutoDock Vina: Improving the speed and
accuracy of docking with a new scoring function, efficient optimization, and
multithreading. Journal of Computational Chemistry, 31(2), 455–461. - Santos-Martins, D., et al. (2021). Accelerating AutoDock4 with GPUs and
Gradient-Based Local Search. Journal of Chemical Theory and Computation,
17(2), 1060-1073.
If you find Ultidock useful, please star the repository and consider sharing
your improvements via pull requests.