2023-07-28

Setting Up vGPU in Docker Container (with Specific Python Version)

Our team has recently been building a backend service using the Django framework. Alongside this, we’ve also developed an in-house machine learning command line utility tool. To run this tool efficiently, we utilize an instance with an enabled nVidia GPU. The challenge lies in setting up an environment where both Django and the utility are containerized with Docker, with the ability to harness the GPU power inside the container. In addition, the tool requires a specific Python version (3.9 in our case) to run.

This task was not as straightforward as we initially imagined. It seemed we could not just pull off a Python 3.9 office image, install the CUDA driver, and expect it to run smoothly. After a few attempts, we found this to be a considerable challenge. We then took a different approach, opting to use an official nvidia/cuda image and installing the needed Python version. After several tweaks, it finally worked.

Here are the main steps involved:

Ensure that the GPU driver is installed properly on the instance. This is relatively easy. We followed this guide. To check if everything is working as expected, use the nvidia-smi command line. If all is well, you will see the CUDA version and other related information. For additional guidance, refer to the official CUDA installation guide.

Use the base Dockerfile to build the backend. It’s crucial to mention that the chosen image should be based on the CUDA driver version used in your environment. If they do not match, the container might throw errors. Ubuntu20.04 comes with Python 3.8 by default. If you do not set up symbolic links correctly, the packages will be installed with Python 3.8, resulting in errors. We used the official NVIDIA CUDA Docker images as a base.

 # Use an official NVIDIA CUDA runtime as a parent image
 FROM nvidia/cuda:11.4.3-cudnn8-devel-ubuntu20.04

 # Set environment variables
 ENV PYTHONUNBUFFERED 1
 ENV DEBIAN_FRONTEND=noninteractive
 ENV TZ=America/Toronto

 # Install Python and pip
 RUN apt-get update && apt-get install -y \
     python3.9 \
     python3-pip \
     && rm -rf /var/lib/apt/lists/*

 # Create symbolic links for Python and pip
 RUN ln -sf /usr/bin/python3.9 /usr/bin/python \
     && ln -sf /usr/bin/python3.9 /usr/bin/python3 \
     && ln -sf /usr/bin/pip3 /usr/bin/pip

 # Set the working directory to /backend
 WORKDIR /backend
 COPY requirements.txt requirements.txt

 RUN pip install --upgrade pip
 RUN pip install -r requirements.txt

 COPY . .

Use Docker Compose to manage our services. Here is a base yml file you can use if you’re utilizing Django as the backend. Make sure to configure the runtime and environment settings correctly for the container to properly utilize the GPU.

 version: '3'

 services:
 backend:
     build: 
     context: .
     dockerfile: Dockerfile
     ports:
     - 8000:8000
     container_name: backend
     runtime: nvidia
     environment:
     - NVIDIA_VISIBLE_DEVICES=all
     restart: unless-stopped
     volumes:
     - ./:/backend
     env_file:
     - .env
     command: >
     bash -c "python manage.py makemigrations
     && python manage.py migrate
     && python manage.py runserver 0.0.0.0:8000"

Test your configuration. If all the previous steps were successful, you should be able to run the nvidia-smi command within the container.