Docker images are very different in local and production, they should not be

As anyone reading the 12factor manifesto, we want our projects to run in production the same way they’re running on our laptop, and we only accept minimal changes between environments: configuration values and persistent data at most.
“Just use a Docker image” says the container enthusiast! It generally goes very wrong very fast, one day you patch a minor version for an obscure library and the next day all hell breaks loose and a major overhaul is required, UNLESS… unless you pin every version down, monitor the build sequence for drifting, and implement a handful of other good practices, all of which we detail in this blog post!
Docker will not fix your process
Docker fever has been going on for the past 7 years, how it has changed the IT ecosystem from the bottom up and why everyone and its grandmother is packaging software into containers. Heck, it has even turned into an action verb: “last sprint was all about dockerizing the stack”!
While its simplicity of use won the heart of millions of developers all stack included, Docker pushed the barriers of C-level support with a powerful promise: build once, run everywhere. At its core, you could take a Docker image from the laptop, test it on the local on-prem cluster and finally deploy it on a CaaS somewhere in your favorite cloud provider, with the same build!
In reality, the workflow where “developer A” tests on its laptop and directly pushes to production is quite rare, simply because mature teams at scale prefer continuous integration, generally linked to git push events on the shared repo. And if a second build is not enough, we can add another Docker build event on production, just before deployment, which is not uncommon in scenarios where the build artefact is generated by an Ops team, after being vetted by the Dev team.
And now for the bitter truth about Docker images.
Drifting — same Dockerfile producing different images
While pondering on some comments on Hacker News, after we shared our path to containers while Documenting our migration to Docker, one statement kept generating discussion, for a good reason, because of its boldness and apparent naive tone:
“Whatever is the underlying executing system, Docker can run it with the exact same code, byte by byte.”
Because Docker “can” run the same code does not mean “it will”, and we want to make it obvious in this post that most of the times, Docker images are not the same, even when the original Dockerfile stays unchanged across builds.
This symptom, a true pain point in SDLC (hey there fellow SRE and QA peeps!), we call it drifting. As time goes by, a loosely written Dockerfile will drift from its original state and produce different software with potentially different behaviour.
Before jumping to solutions and good practices, let’s take a real life scenario of “drifting”.
The case of “dependabot” security updates
Drifting between builds is the involuntary changes introduced by external sources of code and causing adverse effects on systems in production.
Take the nifty dependabot feature on Github, it conveniently watches code repos for updates in the vendor packages. It is so easy that all it takes is to approve a pre-generated Pull Request, and merge it to push a new fix in production.
Once in a while you get these Pull Requests, and it is easy to overlook its impact and push to production a “quick” fix. After a couple of months, the number of important patches has now updated an important part of the dependency graph, while changing system libs and enabling some other parts, without the team’s knowledge and control.
During this seemingly innocent patch upgrade, other potential vulnerabilities were introduced and even regressions, on areas that “dependabot” cannot monitor yet, as Github docs stress.
Reproducible Docker builds
So you have the same dockerfile for different builds, yet it does not yield the same artefact. Look into one of the seven following drift sources. We sorted them from the most obvious to the most specific.
Different ARG == different TAGARG
ARG is the configuration layer of Docker builds. It only exists during build stage, and is preferred to the ENV keyword, if you don’t want to leak data during container run. Depending on ARG instruction and the build-args, the resulting image will be very different. As a good practice, the image should be tagged with a unique label, pertaining to the ARG combination. We found that adding this information as an additional tag or even as a label can decrease troubleshoot time.
ARG BASE_FLAVOR=buster
FROM debian:${BASE_FLAVOR}
ARG UNIQ_VERSION
LABEL version=${UNIQ_VERSION}
BASE_FLAVOR=bullseye-slim docker build --build-arg BASE_FLAVOR --build-arg=UNIQ_VERSION=`echo BASE_FLAVOR=${BASE_FLAVOR}|md5sum` .
No fixed OS version
Biggest and easiest decision is to pin the underlying OS. Contrary to Virtual Machines where maximum isolation is achieved (at greater expense of time and resources), Docker will share the same kernel, leaving each container with a choice of Linux Distribution, e.g. Debian-based or CentOS flavor, or Alpine, etc.
Good practice for reproducible builds: don’t choose an OS based on the tag label in Docker Hub, but really by referencing a specific sha256. We will see later how to deal with updates.
❌ Non-reproducible
FROM debian:buster
✅ Reproductible
ARG SRCSHA=870b187824ecb90dbb968ff713e30ad950c42c6b2ddbae30559a48694b87fd77
FROM debian:buster@sha256:${SRCSHA}
Different target on a multi-stage Dockerfile
Multi stage support since 17.05 has been on my top 3 great features in Dockerfile specification. While it apparently breaks the reproducible sequential workflow of Dockerfile by nature, it also unlocks many opportunities. The same Dockerfile is now used to instrument an image with troubleshooting tooling or optimise a production version for weight and speed. The good practice here is to stick to the same target during test and production. Whatever the situation, do not test on one target and deploy another.
FROM ubuntu AS compiler
RUN apt-get update
RUN apt-get install -y build-essential
COPY hello.c /
RUN make hello
FROM ubuntu
COPY --from=compiler /hello /hello
External dependencies from the internet (curl … | bash)
Remember the small npm package that broke the web? During a previous project, our CTO established one strong rule: the artefacts must be built with zero outbound connection. In other words, we could proxy and cache all required packages, as long as it stayed available in our internal datacenter. There were many good reasons behind this rule: bandwidth expenses, performance and resilience. And then another important reason: the opportunity to audit the downloaded resources because of some incident, and understand the “root cause”. When external dependencies are pulled from the internet today, nothing guarantees their presence or content the next day. A good practice here is to eliminate external coupling as much as possible, and always have a cached copy on private storage as a fallback. Using a global asset proxy like Artifaktory or Nexus also has its risks, as the recent security incidents on “dependency confusion” showed this year.

Different compiling options, in makefiles
This one is obvious and still relevant for our topic.Compilers are powerful beasts that understand many options and arguments. To obtain the closest binaries in different builds, a sane rule is to check options for differences and find the best compromise. One could be tempted to compile for different platforms in Go, or set more passes in production than dev.
Compile outside container and COPY inside
Compiling code outside the container makes the underlying platform another requirement. Even better, containers can also build code by including all necessary tooling: git, compilers, etc. It makes Docker the only tooling for a complete build artefact, which is a neat process. Outside a container, the resulting binaries are prone to many changes. Which comes with more challenges: where to store the binaries, in a git repo or as assets, etc.
The ideal practice is to build inside a container, leveraging the multi-stage pattern as illustrated by the sample dockerfile below:
FROM ubuntu AS compiler
RUN apt-get update
RUN apt-get install -y build-essential
COPY hello.c /
RUN make hello
FROM ubuntu
COPY --from=compiler /hello /hello
CMD /hello
Loosely vendoring libraries
Modern stacks tend to speed up development, by embarking all available librairies in a mega dependency graph. Each and every package inside the dependency graph has its own lifecycle, including major versions. The typical web app pulls dozens of sources from all over the world, some abandoned, some deprecated depending on authors availability.
Living in this moving target needs a disciplined practice of vendoring, down to the patch version. Don’t fret, “semver” is here to help: minor versions are generally safe, but major upgrades are definitely a showstopper. Try to patch with caution and force the exact same dependency graph. Always use “Lock” files, e.g packages-lock.json or composer.lock, that will provide solid ground for reproducibility.
FROM python
COPY requirements.txt /tmp/requirements.txt
RUN pip install -qr /tmp/requirements.txt
WORKDIR /src
COPY . .
EXPOSE 5000
CMD ["python", "app.py"]
Where the diffing helps the drifting
Since drifting will happen, we might as well embrace it and enable continuous measurement. Before deployments, detect drift sources and block image promotion from production for too much difference in size or number of files, or both.
Lets use git in my preferred pattern, as a data source of truth, not a code source of truth. For each build event, we run container-diff to compare images and store the difference. Even better, thresholds will help in selecting the acceptable drift, and rejecting builds with too much drifting. It’s also useful as an observation purpose. After release, the team can assess unwanted changes by reporting the exact state of a Docker image at build time. As you “git commit” the file list from build to build, the track record increases and so does the ability to improve the system.
Continuous diffing is the practice of recording changes occuring in a system as events occur, aspiring to learn from data, find patterns and implement new ideas.
$ container-diff diff daemon://artifakt-public:2021-11-12 daemon://artifakt-public-website:2021-11-11 --type=apt
-----Apt-----
Packages found only in artifakt-public:2021-11-12: None
Packages found only in artifakt-public:2021-11-11: None
Version differences: None
Not all drifting is bad
As many cloud native practitioners, the 12-factor app manifesto holds no secret to you, especially chapter 10 on “dev/prod parity”. As state is needed, apps in production are expected to drift, somehow.
So when container drifting is actually good? For a start, the less state inside the container, the better. That means using the “volumes” abstraction, and declaring them where data persistence is needed.
The other kind of data we can afford is the disposable type: cache, intermediate computing, temporary data transformation. Our experience tells that “tmpfs” volumes are a nice place to store these, not only for the performance, since it stays in pure memory, but also by nature, as it also states that ”this data is volatile in essence”.
Closing words and TL;DR
Tools are here to help, and because of over-expectations, we blame them for our sloppiness. Docker is no exception, and while it changed the way we package and deploy apps for the better, it certainly needs focus during the process.
So here are our 7 good practices to apply for minimal image drifting:
- Use ARG and explicitly version different ARG builds with different image tag
- Pin base image and Linux distros down to the sha256 digest
- Multi stage Dockerfile must be tested and deployed on same stage
- Process external dependencies as internal assets with private cache
- Ensure compiling has stable options across builds
- Compile from inside a container and leverage multi-stage Dockerfiles
- Reference vendor libraries through lock files only
Building happens many times, several times on a laptop, some other time in CI/CD and remote tests environments, and even maybe another couple of times before final release in production. With this approach, you will always know when the deploying artefacts have changed and if it is acceptable. Good luck in the process, and let us know if you learned anything else!
Do you have questions or suggestions on how to make developers’ lives easier? We’d love to hear about it! Reach out to us on Twitter or submit your ideas here.
And if you’re already interested in what Artifakt has to offer, why not book a demo with us?