RAPIDS docker-compose environment for Ubuntu 18.04/20.04
Quick links
Synopsis
This is a collection of scripts for setting up an environment for RAPIDS
development and notebooks starting from a clean install of Ubuntu 18.04
or 20.04. It automates the following steps:
- Installation of dependencies:
- Clangd - a language server used to
add smart features to an IDE, e.g. VSCode. - Visual Studio Code - an IDE.
- Docker - A container
technology for packaging up code with all its dependencies. docker-compose- A tool for building,
configuring, and running groups of docker containers- NVIDIA Container Toolkit - For
building GPU-accelerated docker containers.
- Clangd - a language server used to
- Creation of a
docker-composeenvironment:- Contains suitable settings of compiler and Python versions and other build
parameters.
- Contains suitable settings of compiler and Python versions and other build
- Creation of a VSCode workspace:
- Sets up syntax highlighting, intellisense for C++, CUDA and Python
- Set up some Git conveniences.
- Forking and cloning of the RAPIDS repositories:
- Creation of containers for development work:
dind: The Docker-in-Docker container. This container is built first to
provide a clean environment for runningdocker-composein to build the
other containers. This allows us to mount your local filesystem as docker
volumes while selectively ignoring folders that bloat the docker build
cache and slow down container builds.rapids: Contains builds of all the RAPIDS repositories listed above and
tooling to support development with Visual Studio Code.notebooks: Serves Jupyter notebooks using the RAPIDS libraries built by
therapidscontainer. This allows you to easily test out new features or
fixes to C++ or Python in any of our example demo notebooks.
Installation
Starting with the assumption that this repository has not yet been cloned, and
no other dependencies are installed, these steps:
- Install all dependencies,
- Fork and clone the RAPIDS repositories,
- Set up the VSCode environment and Intellisense.
During execution, you will be asked if you would like to install various
dependencies - it is generally a sensible and safe to answer "yes" to these
questions.
For the forking of Github repositories, you will also be asked for Github login
details so that the script can use the Github CLI to fork from the RAPIDS Github
account to your account.
# 1. Create a directory for all the RAPIDS workspace (any directory is OK)
mkdir -p ~/dev/rapids && cd ~/dev/rapids
# 2. Install Makefile and script dependencies:
sudo apt install curl jq make
# 3. Check out the rapids-compose repo into ./compose
git clone https://github.com/trxcllnt/rapids-compose.git compose
cd compose
# 4. Initialize the local compose environment:
make initYou will be asked to reboot if make init installed docker for you.
Updating
This is a living repo and will evolve along with the RAPIDS projects. It's a good
idea to periodically pull the latest version of the repository to ensure your dev
workflow is kept up-to-date with any changes to the RAPIDS projects' environments:
make init is idempotent, so it's safe to run it from scratch as many times as
necessary. It will ask permission before installing any programs or modifying
any local files, so don't worry about losing any local modifications to your
VSCode workspace or .env files.
# `cd` to the directory where you checked out this repo
cd ~/dev/rapids/compose
# pull the latest changes
git pull origin master
# re-run `make init` to ensure any new settings are applied
make initUsage
Building the containers
To build the dev and notebook containers:
# 1. Build the RAPIDS dev and notebook containers
# a. Only build the rapids container: `make rapids.build`
# b. Only build the notebook container: `make notebooks.build`
make rapids.build
make notebooks.buildThis only needs to be done once to create the containers. During normal
development, RAPIDS components can be rebuilt inside the dev container.
make is the "easy button" to build the containers and compile each of
the RAPIDS projects from source.
If you want to rebuild your containers (e.g. after a docker pull nvidia/cuda,
or modifing the container Dockerfiles), or if you want to take a coffee break
and come back to a fully re-built set of RAPIDS projects, it's always safe to
docker rm -f any existing rapids containers and re-run make from top.
Working interactively in the RAPIDS container
To run the rapids container and launch a tty to explore it interactively, run:
make rapids.runLaunching the Notebook container
To launch the notebook container (needed each time the system is started):
make notebooks.runWorking inside the RAPIDS container
See the header for etc/bash-utils.sh for the suite of useful commands to run inside the dev container.
Miscellaneous Troubleshooting
Conda environment conflicts
"I changed something in my .env, restarted the container and now conda is spending hours resolving
conflicts. What do I do?"
rapids-compose builds a combined conda environment from the environments of all the RAPIDS
repos (cuDF, cuML, cuGraph, RMM, etc.). So if one of those repos on your system is out of sync,
there can be conflicts. To solve this:
-
ensure you have all repos pulled to the same branch. If you are working on a feature branch,
you may need to merge the latest from the base branch into your feature branch. -
compose/scripts/git-checkout-same-branch.shis a script that automates checking out all of your
repos to the same branch. It examines the repos and prompts with a choice of common branches. -
If you are using a CUDA toolkit release candidate or prerelease (e.g. CUDA 11.0 RC), there may
not be conda packages for this available yet. In this case, you can still build RAPIDS libraries
(e.g. libcudf.so) from source, but you may not be able to build and use the Python
bindings/libraries. In this situation, in thecompose/.envfile, you need to setCUDA_VERSION
to the version of the CUDA toolkit you want to use, but setCONDA_CUDA_TOOLKIT_VERSIONto the
latest earlier version for which conda packages exist. So in the case of working with CUDA 11.0
RC, setCUDA_VERSION=11.5.1 CONDA_CUDA_TOOLKIT_VERSION=11.5
DNS Resolution Issues
When creating the Docker-in-Docker container, downloading of the Alpine package
release key fails with the following error:
+ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub
wget: bad address 'alpine-pkgs.sgerrand.com'This is a DNS resolution failure in the container - there can be numerous causes
for DNS resolution failures, but one that can occur within the NVIDIA
environment is the Cisco AnyConnect client breaking networking for containers.
If this occurs, a possible resolution is to restart the machine and attempt
make again prior to making any VPN connections.
Alternatively, you can use the Cisco-compatible OpenConnect VPN client, which
seems to have better integration with systemd:
sudo apt install network-manager-openconnect-gnomeError running groupadd
When building the RAPIDS container, an error running groupadd is encountered
just after the build of ccache:
...
INSTALL ccache
INSTALL ccache.1
/
groupadd: GID '0' already existsThis occurs if you are running make as root. You should run it under your
normal user account.
Wishlist
Acknowledgements
Inspired by container development patterns from @millerhooks