Optimizing image storage in Docker repository using Jib for Spring Boot

问题

Does using Jib to build Docker images help optimize remote Docker repository storage?

We are using Spring Boot in Docker with Gradle. Currently, we are creating standard fat Boot jars with all the dependencies packed inside, and then we create an image with it, like so:

FROM gcr.io/distroless/java:11
COPY ./build/libs/*.jar app.jar
CMD ["app.jar"]

This results in a big (250 MB) new image each time we build, even if very little code is actually changed. This is due to the fact that the fat jar contains both the shared dependencies (which change infrequently) and our code. This is inefficient usage of storage space in our private repository and we would like to change that.

For this, the idea is the following:

We create a base image which contains only the dependencies in /opt/libs, let's call it spring-base:1.0.0 and push to our private Docker registry.

We use that image as a parent/base of the application image which contains our code only. The Dockerfile looks similar to this (untested, just to present the concept):

FROM our-registry/spring-base:1.0.0
COPY ./build/classes/kotlin/main/* /opt/classes
COPY ./build/resources/main/* /opt/resources
ENTRYPOINT ["java", "-cp", "/opt/libs/*:/opt/resources:/opt/classes", "com.example.MainKt"]

The expectation is that these images are much smaller, and the big base image with dependencies is stored only once, saving a lot of storage.

A colleague of ours looked into Jib and insists it does exactly this, but after reading the whole documentation and FAQ and playing around a bit with it, I am not so sure. We integrated it and use ./gradlew jibDockerBuild and it does seem to create layers for the dependencies, resources, and classes, but there is still just one big image. Jib seems to focus on speeding up build times (by utilizing Docker layer caching) and reproducible builds, but I think that when we upload that image to our repository nothing will change relative to our current solution - we will still store the 'static' dependencies multiple times, but now we will have multiple layers instead of just one in each new image.

Could anyone with more Docker and Jib experience explain whether Jib gives us the storage space optimization we are looking for?

EDIT: while I was waiting for an answer, I played around with all of this and used https://github.com/wagoodman/dive, docker system df and docker images to check sizes and look into the images and layers, and it seems Jib does exactly what we need.

回答1:

Does using Jib to build Docker images help optimize remote Docker repository storage?

Yes. Indeed, it helps this to a significant degree, because of the strong image layer reproducibility. When just using Dockerfile, you usually and completely lose reproducibility for most layers, because file timestamps are factored into checking whether layers are identical. For example, even if the bytes of your .class didn't change at all, if the file is generated again, you will lose reproducibility. This is worse for jar; not only its timestamp can change, but jar metadata (for example, META-INF/MANIFEST.MF) contains compile-time information including timestamp, build tool info, JVM version, etc. A jar built on a different machine will be considered different in the Docker world.

This results in a big (250 MB) new image each time we build, even if very little code is actually changed. This is due to the fact that the fat jar contains both the shared dependencies (which change infrequently) and our code.

Partially correct that the size is big (250MB), but not because of the fat jar. The size of the built image will always be 250MB even if it is not a fat jar and even if you designated a different layer for shared libraries. The size of your final image (250MB) will always include the size of the base image (gcr.io/distroless/java:11) and the size of the shared libraries no matter how the image is built by which tool.

However, Docker engines do not duplicate layers that they already know about in their storage. Likewise, remote registries do not either duplicate layers that already exist in a repository. Moreover, often registries even store exactly one copy of a layer across different repositories. Therefore, when you update only your code (hence your jar), only the layer containing that jar will take up new storage space. And Docker and Jib will send only new layers to remote registries over the network. That is, the base image layers for gcr.io/distroless/java:11 will not be sent.

We create a base image which contains only the dependencies in /opt/libs, let's call it spring-base:1.0.0 and push to our private Docker registry.

Creating a separate image only to contain shared libraries is not something unheard of, and I have seen some people attempting this. However, I don't think you do intent to conceptually treat this special base image as an independent, standalone image that is meant to be shared across different kinds of images in your organization. So I think doing so is unconventional in this situation, and this trick is most likely unnecessary if it is only an idea came off the top of your head regarding saving storage space (and network bandwidth). Please continue to read.

The expectation is that these images are much smaller

No. As I explained, you will create an image of same size 250MB no matter what. It includes the size of the base image, which includes your shared libraries. When running docker images, your local Docker engine will show that the image size is 250MB. But as I said, that does not mean your Docker engine takes up additional 250MB of space whenever you build a new image.

the big base image with dependencies is stored only once

Yes, but this can also be true when you start with FROM gcr.io/distroless/java:11. It is meaningless to shove your shared libraries into another "base image", as long as you can create a separate layer of its own for the shared libraries and keep the layer stable (i.e., reproducible). And Jib is very good at reproducibly building such a layer. The granularity of bits saved in registries is layers and not images, so there is really no need to "mark" that the libraries layer is in some "base image" (as long as you create its own layer for the libraries). Registries only see layers, and a notion of an "image" is formed by just declaring that "this image is comprised of layer A, layer B, and layer C along with this metadata." An image doesn't even have a notion of a base image; it doesn't say like "this image is by putting layer A on top of this base image." As long as layer B is a shared libraries layer, you have better optimization than having a fat jar layer.

saving a lot of storage.

Therefore, this is not true. After all, Docker engines and registries do not store the same layer multiple times for no good reason.

We integrated it and use ./gradlew jibDockerBuild and it does seem to create layers for the dependencies, resources, and classes, but there is still just one big image.

Yes. The image size will be 250MB. This will still be true when you use Dockerfile or any other image building tools. However, when using Jib, if you change only your application .java files, Jib will send only the small application layer (that does not contain shared libraries or resources) over the network to a remote registry when rebuilding; it doesn't send the whole 250MB of layers, because Jib keeps strong reproducibility. Similarly, if you only update your shared libraries, Jib will send only the libraries layer, saving time, bandwidth, and storage.

Note, however, due to the limited capability of the Docker engine API that lacks a way for Jib to check if certain layers are already stored in a Docker engine, Jib has to load the whole 250MB of layers when using jibDockerBuild. This is usually not an issue, because loading is done locally without going through network. But because of this API limitation, surprisingly it is often faster for Jib to directly push an image to a remote registry than to a local Docker engine; Jib only needs to send layers that have been changed. However, as I have stressed multiple times, even if Jib (or any other image building tools) load the whole 250MB of layers into a Docker engine, the engine will save only what are necessary (i.e., new layers that it had never seen–or it believes so). It won't duplicate the base image or the shared libraries layers; only new, different layers will take up storage. And with Dockerfile, you'll usually end up generating "new layers", even though they are practically not new because of poor reproducibility.

来源：https://stackoverflow.com/questions/60731374/optimizing-image-storage-in-docker-repository-using-jib-for-spring-boot

标签

spring-boot

Docker

Jib