Tips for Optimizing Docker Builds

Tips for Optimizing Docker Builds

Photo by Ian Tayloron Unsplash

In this post we will go over some best practices you should follow when writing a Dockerfile or when building a container from a Dockerfile.

One of the most important tools in any developer’s toolbox is the container engine Docker. Docker gains its well-deserved popularity because it allows to package apps in images that can run everywhere, even on old commodity hardware – and allows it with ease. Having worked with Docker for a sizable amount of time, I have picked a few doses of best practices that have helped me optimize Docker images and containers. In this post, I will touch on some of the practices I have found useful while working with Docker. These practices largely revolve around consistency, security, and reducing build time and image size.

Keep Your Docker Image Small

One of the most important things to internalise while working with Docker is keeping your image size small. Docker images can quickly grow in size in the gigabyte range. While a 1GB Docker image on local development is insignificant in terms of space consumption, the disadvantages become apparent in CI/CD pipelines where you may need to pull a specific image several times to run jobs. While bandwidth and disk space are inexpensive, time is not. Each additional minute added to the CI time adds up to become consequential.

For example, every 1 or 2 minutes of additional build time that can be optimized could add up over a period of time to hours of lost time per year. If your CI pipeline runs 50 times per day, that equates to 90,000 to 180,000 lost seconds per month.

This means that a development team could be waiting for 60s x 50 (50 minutes) per day for feedback from CI that could be avoided.

Number of Builds Per Day Additional Build Time Lost Time Per Year
50 60s (60s X 50 ) X 5 days X 52 weeks = ~216hrs/year

But the actual time savings do not really come from having a faster CI/CD pipeline, but rather a quicker feedback loop for all developers. Slow builds and therefore long wait times are the enemy of staying “in the zone” (see point 8 of the Joel Test).

How do you keep the size of your image small?

  • Only install dependencies your application need and nothing more
  • Use a small base image. Don’t use Ubuntu as your base image where you can use Alpine Linux, for example, which is smaller.
  • Each layer you add increases the size of your image. Try to use as few layers as possible
  • Don’t install dependencies only to remove them later.
  • If you have to install files, use the smallest one when possible.

Sort Multi-line Instructions

Unsorted lines are challenging to read. We tend to skim through them quickly during code reviews, whether consciously or unconsciously because they require more brain power and concentration to understand. When Docker’s instructions are unsorted, it creates a haven for unnecessary and duplicated dependencies. A Docker container should only contain the bare minimum needed to run an app. A sorted docker multi-line instruction or argument makes instructions in Dockerfiles easier to read. It will help you to quickly spot or detect duplication of arguments or that unnecessary dependency that could be avoided where they exist.

Consider the examples below:

Example 1 (bad):

RUN apt-get update && apt-get install -y libsqlite4-dev
aufs-tools automake build-essential curl dpkg-sig libcap-dev libsqlite3-dev mercurial reprepro cowsay ruby1.9.1 s3cmd=1.1.*

Example 2 (good):

RUN apt-get update && apt-get install -y \
   aufs-tools \
   automake \
   build-essential
   curl \
   dpkg-sig \
   libcap-dev \
   libsqlite3-dev \
   mercurial \
   reprepro \
   cowsay \
   ruby1.9.1 \
   s3cmd=1.1.* \
   libsqlite4-dev

Both Docker instructions include unnecessary dependencies: cowsay. This example shows that sorted lines make it easy to spot unnecessary dependencies. In the first example, unnecessary dependencies like cowsay can be hard to spot.

Identify Cacheable Layers and Combine Them

Layer is one of Docker’s image most interesting and useful attributes. Docker images are constructed in layers. Each layer corresponds to Dockerfile instructions and represents a filesystem change of the image between the previous state before execution and the state after executing a command. Docker caches layers to speed up build times. If nothing has changed in a layer (the instructions or the files), Docker will simply reuse previously built layers from the cache instead of rebuilding it.

Having unnecessary multiple layers, on the other hand, adds overhead. Because Docker layers are filesystems, many unnecessary layers have performance implications. Since each run command creates a new layer, it is more efficient to create a single cache with a single RUN command that applies all dependencies than to split it into multiple layers. The time saved by identifying cacheable layers and leveraging them will add up to a significant amount of time in the long run.

RUN apt-get update && apt-get install -y \
   aufs-tools \
   automake \
   build-essential \
   curl

RUN apt-get install
   dpkg-sig \
   libcap-dev \
   libsqlite3-dev \
   mercurial \
   reprepro \
   ruby1.9.1 \
   ruby1.9.1-dev \
   s3cmd=1.1.*

The trick is identifying cacheable layers and combining them as shown below.

RUN apt-get update && apt-get install -y \
   aufs-tools \
   automake \
   build-essential \
   curl \
   dpkg-sig \
   libcap-dev \
   libsqlite3-dev \
   mercurial \
   reprepro \
   ruby1.9.1 \
   s3cmd=1.1.*

Always Specify a Tag

The examples below show how you could potentially start a Dockerfile

FROM node

And

FROM node:12.1

Starting a Dockerfile with the second option is recommended because it has the base image pinned to a specific version. The first example will automatically pick the latest build. The problem with using unpinned dependency is that consistency is not guaranteed. There may be a breaking change etc. We mainly pin versions for certainty and visibility. When you have a pinned version of the base image, you know exactly which version is used at any time.

Specifying tags also apply to building images. You should never rely on the automatically-created latest tag and always be explicit about it.

Create a Common Base Image

Suppose you’re working with multiple microservices that have a lot in common. Perhaps they all have the same base image and share certain dependencies. It’s best you create a base image with the shared components that all other images can be based on. It will allow you to apply common changes in one place. In addition, you will also benefit from Docker layer caching. Because the multiple services share the same layer, Docker will load the common layer from cache saving you some build times. You can build once and reuse the layer.

# Dockerfile for service A
FROM my-common-base-image:2.3.1 as base

# more Dockerfile instructions

In the other services, you can use the common base image.

# Dockerfile for service B
FROM my-common-base-image:2.3.1 as base

WORKDIR /app
....

Scan your Image

Applications packaged in containers are not immune to security vulnerabilities. Your application will rarely be solely made of the code you wrote yourself. It will have dependencies and libraries written by other people. The more dependencies you have, the broader the attack surface is. You may never know how vulnerable your Docker image is unless you scan them. Docker leverages Snyk engine to provide vulnerability scanning services you can use with the “docker scan” command as follows:

docker scan IMAGE_NAME

There is no easier way to enforce using secure docker images and dependencies than making it a part of your CI/CD pipeline if you’re in a large team. Leveraging “docker scan” will help you build a more secure application.

Use Image Layer Ordering to Your Advantage

Docker images are made up of layers stacked on top of each other. There is an important lesson you need to learn here to be able to leverage image layering to your advantage. Once a layer changes, all downstream layers will be recreated. The trick to leveraging this is to ensure that layers that do not change often stay on top, and those that change frequently are pushed downstream.

Consider the example Dockerfile below. We can optimize it by using image layering to our advantage.

FROM node:14.14.0-alpine3.12 as base

WORKDIR /app

COPY . . # copy the source code, which changes often.

RUN npm install # your dependencies don't change very often

Application dependencies don’t change as often as the source code changes. The code you’re actively working on will change several times a time as you add features. To prevent the dependencies layer from rebuilding each time your code changes, you can rearrange Dockerfiles instructions. The dependencies should be built as a separate layer, pushed up, and built before copying the source code that changes frequently as follows:

FROM node:14.14.0-alpine3.12 as base

WORKDIR /app

COPY package*.json ./
RUN npm install # install dependencies, which change rarely before copying your source code

COPY . . # copy your source code, which changes the most in the top-most layer

Leverage Multi-stage Build

A multi-stage build is a method of organizing your Docker instructions in such a way that the final image is as small as possible and contains only what you need. Applications often have dependencies that are not required during app runs. Such dependencies could include the library you use to lint code, compiler, fakers, and testing frameworks. These dependencies are known as dev dependencies, and you should avoid them in your production build. In the past, people often circumvented the problem of including development dependencies in production build by having separate Dockerfiles for test build and production build. But it does come with a code duplication and the complexity of maintaining multiple Dockerfiles that only differ by a small margin. This is where multi-stage build helps. You can have instructions for multiple images while copying just the needed layers from one image to another.

# Common build stage
FROM node:14.14.0-alpine3.12 as base

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY . .

RUN npm run build

EXPOSE 8000

# Production build stage
FROM node:14.14.0-alpine3.12 as production-build
COPY package*.json ./
RUN npm install --only=production

COPY --from=0 /app/build/  ./build/

ENV NODE_ENV production
RUN npm ci --only=production

CMD ["npm", "run", "start"]

Use the Official Docker Image as Your Base Image

Docker Inc. sponsors a dedicated team responsible for reviewing and publishing all content in the official docker repositories. Generally, the Dockerfiles of those images are well written and follow best practices. Docker Inc. also ensures that patches and fixes are applied to those images in a timely manner.

Instead of duplicating work by creating your own base image, you can take advantage of the ongoing maintenance work. There are times when you should deviate from an official image, such as when you need to create an image that is optimized for your use case. If there is no compelling reason to use unofficial Docker images as your base image, you should always use official Docker images.

Don’t Omit LABEL and EXPOSE Instructions

Docker also includes a bunch of instructions like EXPOSE and LABEL that will make your life and the lives of the people working with your images easier. If your container exposes a port, be explicit about it and specify what it exposes. Finally, use labels to make your images more descriptive.

Wrapping Up

To sum it up, if you want to pick one thing out of this post, it should be the idea of keeping your image small. Having a small docker image is beneficial because it:

  • Improves security
  • Reduce disk usage
  • Reduces build time
  • Improves your CI speed thus improving productivity
  • Shortens the development feedback-loop and keeps you in the zone
Tips for Optimizing Docker Builds
Older post

Monolith vs Microservices (Splitting a Monolith Part 2)