GitHunt
BF

bfroehle/slither

Static Python Builds for HPC Systems

Slither: Static Python builds for HPC Systems

Slither is a set of patches for Python (and related modules) and a
command line tool for building static CPython binaries. In addition
slither supports byte-compiling Python module sources into
frozen modules.

In the most optimized configuration, Slither is capable of producing
binaries which are entirely self-contained and strictly minimize the
number of file system calls (i.e. stat and open).

Rationale

While newer HPC systems do support dynamic libraries, the support
remains unoptimized. Locating and loading each shared object can
result in a lot of file system contention, especially for larger jobs
involving thousands of processors. This contention is also present in
importing regular Python (.py) modules. Depending on the Python
configuration, each imported module can require ten or more stat
system calls.

This contention manifests itself as a very long start up time --- the
amount of time required for the Python interpreter to start and the
prerequisite modules to be imported. On one node, this is often under
a minute and is generally ignorable. On a hundred nodes this can
take hours. See
Shared Library Performance on Hopper
(slides)
by Z. Zhao et al. and
Python in a Parallel Environment by D. Grote.

Previous Work

The GPAW package
static python builds
provided instructions and patches for static Python builds, which this
project builds upon.

Method of Operation

This project is composed of the following pieces.

  1. Build a custom Python 2.7.3 interpreter which contains a patched
    version of distutils.

  2. Use the custom interpreter to build and install Python
    modules. Most modules can be built with little or no additional
    configuration.

  3. Use the bin/slither script to build custom, statically linked,
    Python interpreters which bake in Python module byte-code.

Getting Started

Please see the INSTALL.md for installation instructions.

Alternative Approaches

Some tools have been developed to speed up Python startup times. For
example, NERSC provides the DLcache and FMcache tools on Hopper.

DLcache is a general purpose tool to optimize the importing of shared
objects (i.e., dlopen). FMcache is a more specialized tool to
optimize the importing of Python modules (.py and .pyc files). In
each case a cache is built by executing the program on a small number
of MPI ranks. The cache is then distributed and the job run on a large
number of MPI ranks. The main drawbacks are the following:

  • It must be easy to run the same job (or at least the same beginning
    import statements) at small width. Often this means creating a
    separate .py script which just imports the modules you will be
    using.

  • The libraries are loaded in the same order on every rank. In
    practice this means loading every module you might use at the
    beginning of your script.

  • The batch job script must be altered with several pre- and
    post-processing steps.

Resources

Contributors

BSD 3-Clause "New" or "Revised" License
Created March 8, 2013
Updated July 17, 2024