First of all, I want to make it clear I\'ve done due diligence in researching this topic. Very closely related is this SO question, which doesn\'t really address my confusio
Instructions like VOLUME
and EXPOSE
are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME
instructions (or using the --volume
option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from
option. Here's some articles that describe this outdated pattern.
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME
instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data
.
However, the VOLUME
instruction does come at a cost.
The latter issue results in problems like these.
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME
was designed for a world before Docker 1.9. Best to just leave it out.