Docker From Scratch, Part 2: Dockerfiles

In the last post, we pulled an ran a Docker container from the command line. Even though we only needed a few commands, it can become tedious to run the same lengthy command over and over each time we want to work with our container.

Worse, such repetition is prone to error. One thing we want from Docker is a repeatable environment: something that builds the same way each time we build it, on each system we build it on, for all the developers on our team. You might be tempted to just put everything in a script file, but even that has portability problems. Bash doesn’t work on Windows (without Cygwin). We need something that only depends on Docker itself.

How to Make a Server with One Text File

Docker includes the ability to build containers by using a Dockerfile. The Dockerfile is like a script, but only for creating a ready to use, Docker container. It instructs Docker which container on the Docker Hub to start with, what files to copy into the container, what commands to run while building the container, and what command to run when the container is ready to use.

Dockerfiles are weird. The syntax may look oddly archaic, but it is simple and straightforward enough to learn quickly. All Dockerfiles start with two lines:

FROM debian:wheezy
MAINTAINER your_email@example.com

The FROM statement instructs Docker what container on the Docker Hub this container will start with. Containers are a bit like that phrase, “It’s turtles all the way down.” All containers in existence are based on another container on the hub. It’s containers all the way down -- until you hit the mother of all containers, “scratch”. The “scratch” container is an empty *.tar.gz file. Even if you want to build your own container from an install disk, you’re still starting, quite literally, from scratch. In the above example we’re pulling from the “debian” container, but not just any Debian container, we’re specifically pulling from Debian 6, or “Wheezy”. There’s typically a special version keyword “latest” that refers to the newest version of that Linux distro. You separate the name of the container from the version with a full colon.

The MAINTAINER line simply specifies who built the Dockerfile. While the text that follows the maintainer can be anything, the common convention is an email address.

Building the Container Using a Dockerfile

The default name of a Dockerfile is just “Dockerfile”. No extension. It’s rather like a Vagrantfile in that respect, although Docker’s syntax is a bit more foolproof. To build your new container, you use the “build” docker command:

$ docker build .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> 60c52dbe9d91
Step 1 : MAINTAINER your_email@example.com
 ---> Running in 486f3a2255b4
 ---> 64b0d7e8eef9
Removing intermediate container 486f3a2255b4
Successfully built 64b0d7e8eef9

The build command takes a single argument, the name of the hub repository to pull. In this case, we specified a single period, “.” to indicate the current directory. Docker will then look for a Dockerfile in the current working directory. Docker then starts building our image by pulling from the hub the container we specified in FROM. Then it reads the MAINTAINER line, and comes to the end of the file.

Once that’s built, you’ll probably want to run the “ps” command to see if it’s running. We’ll use the “-q” switch to only see the container IDs:

$ docker ps -q
$

Nothing! What the heck? Did the build command do anything at all? It did! Look back to the results of the “build” command. Notice the line “Removing intermediate container”? This hints at something else we need to learn about how Docker does things.

Docker Images

If you’ve worked with VMs, you know that the virtual hard drive is stored as a file or files on the host’s file system. At some point, you may have configured the system just as you wanted and wanted to preserve the state so you can fall back to it later. To do so, you created a “snapshot.” Version control systems like git also use a similar concept, where you create a snapshot of your source code as a commit which is identifiable by a unique hashcode.

Docker isn’t like a VM though, and doesn’t create a virtual hard drive like one. Instead, it works a lot more like a git repository. With each statement in the Dockerfile, Docker takes the current state of the container and commits it, generating an image. Like a git commit, Docker images are identified by a unique hashcode.

An image isn’t a running container: it’s more like a snapshot of a container that once was. We can list images using the “images” command:

$ docker images
REPOSITORY   TAG        IMAGE ID       CREATED           VIRTUAL SIZE
                        64b0d7e8eef9   1 minutes ago     85.02 MB
debian       wheezy     60c52dbe9d91   4 weeks ago       85.02 MB
debian       latest     9a61b6b1315e   4 weeks ago       125.2 MB

Wait, there are two debian images? Yes! There’s one from when we ran a Debian container interactively in the last post. Since we only specified “debian” it pulled the “debian:latest” repository. In our Dockerfile, we specified “debian:wheezy”, an earlier version of the OS. We also have one more image that has no repository and no version tag.

Look back to the output from the “build” command, paying close attention to the hashcodes. Then look at the output of the “images” command. After step 0 -- the FROM statement -- we see image ID starting with “60c”. Then the MAINTAINER step, we get a new container ID starting with “486”, and a new image ID starting with “64b”. A curious thing happened: the 486 container was removed! Why?

Remember, the Docker “build” command doesn’t build a running container, it only builds images. That’s why the so-called “intermediate” container was removed.

Running an Image

We can run a container from an image using the “run” command, just like before:

$ docker run -i -t 64b /bin/bash

Notice this time that instead of specifying the image name on the Docker Hub, we actually specify one we have locally. Even better, we only need to specify part of it. As long as it’s unique on the host, Docker will find it. Once inside, we can verify the Debian version:

root@71724ddf2f01:/# cat /etc/issue
Debian GNU/Linux 7 \n \l

We have Debian Wheezy, just like we expected. The trouble is, we have just a barebones container. What if we needed software like Apache or MySQL? Sure, we could run a script each time to add software to the container, but then we have the same problem again: shouldn’t the Dockerfile do this for us?

Installing Software in a Dockerfile

Dockerfiles can do more than specify FROM and MAINTAINER. Perhaps the most versatile command you can use in a Dockerfile is RUN.

The RUN statement does what it says on the tin: it runs a command on the container. We can use this statement to do all sorts of things in our container while building it, such as installing software.

FROM debian:wheezy
MAINTAINER your_email@example.com

RUN apt-get update
RUN apt-get install -y apache2

Here we’ve added two RUN statements. The first one updates the Debian software repositories. The second installs Apache. We can now create our container like we did before with the “build” command:

$ docker build .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> 60c52dbe9d91
Step 1 : MAINTAINER your_email@example.com
 ---> Using cache
 ---> 64b0d7e8eef9
Step 2 : RUN apt-get update
 ---> Running in 8652b461d629
Get:1 http://security.debian.org wheezy/updates Release.gpg [1554 B]
...
Fetched 8438 kB in 7s (1094 kB/s)
Reading package lists...
 ---> eefa2fcb15e7
Removing intermediate container 8652b461d629
Step 3 : RUN apt-get install -y apache2
 ---> Running in ef474ef3a976
Reading package lists...
Building dependency tree...
Reading state information...
…
Setting up libswitch-perl (2.16-2) ...
 ---> aa084388ac30
Removing intermediate container ef474ef3a976
Successfully built aa084388ac30

Looking carefully at the command output, we can see that we get two new container IDs and two new image IDs:

$docker images
REPOSITORY   TAG        IMAGE ID       CREATED           VIRTUAL SIZE
                        aa084388ac30   2 minutes ago     145 MB
debian       wheezy     60c52dbe9d91   4 weeks ago       85.02 MB
debian       latest     9a61b6b1315e   4 weeks ago       125.2 MB

Now that’s interesting... Our old image ID is gone, replaced with the newest one, “aa0”. What happened? Docker images, like commits, stack on top of each other. It’s rather like how a git commit doesn’t have a complete copy of the source code at one point and time, but rather a list of changes to files. Each image is like a list of changes. This is one wonderful thing about Docker over VMs. Each time you update the container, you only record changes. This makes updating and saving changes in a container very, very fast and resource conservative.

The Union FS and the Build Process

This entire system of stacking images is based on an existing file system technology called a union file system. It’s a method of mounting these lists of changes into a cohesive whole. The thing is, as you go back in the history of your container, you have to start from an initial image, or base image. The base image for our container is “debian:wheezy” on the Docker Hub.

Each statement in a Dockerfile results in a new image being taken. Even with two consecutive RUN statements, Docker will create a new image after each successful statement. This means that if any RUN statement (or any other statement in the Dockerfile) fails for whatever reason, Docker will automatically roll back to the last good image.

There’s also another implication that’s much more subtle. For each Dockerfile statement, Docker starts up a new container based on the last image. Docker then performs the statement on the container. For RUN commands, the command is run within the container. When the statement finishes successfully, Docker shuts down the container and takes a new image. That’s why we see “Running in…” and “Removing intermediate container…” pairs during our build command.

Summary

In this post, we’ve built a new Docker container using a repeatable and easy-to-understand Dockerfile. We learned about how Docker uses the UnionFS to store snapshots of containers as images. We’ve used the ADD statement to add software to our containers. Next time, we’ll expand our container to be easily runnable so we don’t to specify the command to run in the container.

Read Part 3.

docker