Building Better Docker Images

2015-06-06 tech programming linux docker

It’s been about eight months since I wrote down some of my thoughts on building good docker images. The docker ecosystem has continued moving quickly and I’m pleased to say that most of the principles listed in the old article have aged well. I don’t have a ton to add, but here’s a few things that I’ve discovered since then:

Base images off of Alpine

alpine is a 5 megabyte image based on Alpine Linux (Docker hub link). Despite being small, it comes with a package manager (apk) with access to a well-maintained, modern package repository. This base image is ideal for building tiny application containers (e.g. a 6 MB redis image, a 17 MB node.js 0.10 image, a 6 MB PostgreSQL client).

Alpine presents some challenges if you need to stray beyond the package manager. Alpine uses musl libc, so most dynamically-linked “Linux” binaries that you can download off the web will not run. Instead, you may find yourself building from source, in which case it’s helpful to know that the Alpine build-base package is roughly equivalent to Debian’s build-essential. Alpine’s default shell is ash, but you can install bash through the package manager if you need or prefer.
Write tests

I picked this up from the bright guys at Aptible. When you want to guarantee that your image has a certain feature set, it can be useful to run a suite of tests as part of the Dockerfile. If the image is just for you, this is probably overkill, but if other people are using or building off of your image, or if you want to make things explicit and maintainable (e.g. for others on your development team), it can be very helpful.

With tests in the Dockerfile, a compiled and published image comes with guarantees about the image’s behavior. If you try to rebuild the image and the tests fail, you get some indication of what’s changed since then (typically some external state, as discussed in the previous article).

Here’s an example of some simple tests from my jbergknoff/sass repository (link):
```
  #!/bin/sh
  echo --- Tests ---

  echo -n "it should install sassc 3.2.1... "
  sass -v | grep sassc | grep "3.2.1" > /dev/null
  [ "$?" -ne 0 ] && echo nope && exit 1
  echo ok

  echo -n "it should compile SCSS... "
  echo '$blue: #00f; .thing { color: $blue; }' > /tmp/test.scss
  sass /tmp/test.scss | grep "color: #00f" > /dev/null
  [ "$?" -ne 0 ] && echo nope && exit 1
  rm /tmp/test.scss
  echo ok
```
This content is in a file test.sh which gets RUN as part of the Dockerfile. If a test fails, the Dockerfile build fails.

If you’d prefer a test runner/framework, consider bats. It’s a light wrapper around bash scripting, adding some structure for testing (the ability to skip tests, setup/teardown steps, etc.).
Use scripts

Sometimes it makes sense to break out a part of a Dockerfile into a shell script. For instance, while it’s good to clean up after installing a package through a package manager, it can get awkward to have a long && chain in a RUN command just to enforce cleanliness.

Instead, consider making a script. Here is another example from jbergknoff/sass, the build.sh script (link):
```
  #!/bin/sh

  # build
  apk --update add git build-base
  git clone https://github.com/sass/sassc
  cd sassc
  git clone https://github.com/sass/libsass
  SASS_LIBSASS_PATH=/sassc/libsass make

  # install
  mv bin/sassc /usr/bin/sass

  # cleanup
  cd /
  rm -rf /sassc
  apk del git build-base
  apk add libstdc++ # sass binary still needs this because of dynamic linking.
  rm -rf /var/cache/apk/*
```
This script is responsible for grabbing the SASS source code, building it, and then cleaning up. Obviously the final image (a 9 megabyte SASS application container) shouldn’t have git installed, but imagine encoding that entire sequence as one big RUN simply to keep git out. It seems incongruous.

Because all of the installation and cleanup happens in one RUN command, there is no extra bloat hanging around (recall that files introduced in one RUN and removed in a subsequent RUN are still taking up space in the image). This technique can help all sorts of Dockerfiles, making them cleaner and easier to understand. In the case of building from source, it’s almost always beneficial.