> The packages—gcc and so on—needed to compile Python are removed once they are no longer needed.
Is there a reason to prefer this method, where installation, usage, and removal all happen in one RUN, vs. using a multi-stage build? I tend to prefer the latter but am not aware of tradeoffs beyond the readability of the Dockerfile.
The reason is to minimize the number of layers, mostly the layers that are not used in the final image (in this example, a layer would include gcc while another would remove it, but you would still need to download the layer with gcc).
I thought non-final stages in a multi-stage build are left out of the final artifact; is that incorrect?
I'm thinking the "build step" would be done in an earlier stage; it could be the exact same RUN statement, or it could be split into multiple for readability, and wouldn't bother removing any installed packages, since they won't carry over to the final stage. Then the big RUN in the final stage would be replaced with something like `COPY --from=builder ...`.
If you are doing multi stage builds it only matters to combined as many statements as possible in the last layer.
I agree that for clarity it is nice not to optimize layers in the build stage - those will be thrown away anyway.
I vastly prefer multi stage builds over having to chain install and cleanup statements
Example: I usually want to use the python:3-slim image, but this doesn't have the tools to compile certain python libraries with C extensions. Generally I will use the python:3 image for my build stage to do my "pip install -r requirements.txt" and then copy the libraries over to my final stage based on the python:3-slim image
Of course I could install and uninstall GCC and other tools in a single stage.. but that actually takes longer to do and is messier in my opinion.
> Example: I usually want to use the python:3-slim image, but this doesn't have the tools to compile certain python libraries with C extensions. Generally I will use the python:3 image for my build stage to do my "pip install -r requirements.txt" and then copy the libraries over to my final stage based on the python:3-slim image
For image size, it doesn't matter. It does matter for build time.
Specifically, multi-stage builds let you get better caching and therefore faster rebuilds, since you can cache the pre-installed gcc etc. layer, while still getting the small image.
So if you have a human being waiting on frequent Docker build results, yes, multi-stage is better.
In this case, the builds are automated, no one waits for them, so it doesn't really matter (except for burning some extra CPU cycles).
Is there no downside to multi-stage? Even aside from caching behavior I prefer multi-stage builds, as I'd much rather read & maintain a bunch of RUN lines which do one specific thing, rather than dozens joined with &&.
There isn't too much of a downside, except that you need to be a little more careful in your CI, otherwise you can end up rebuilding from scratch each time, thus losing the benefit of faster builds.
Is there a reason to prefer this method, where installation, usage, and removal all happen in one RUN, vs. using a multi-stage build? I tend to prefer the latter but am not aware of tradeoffs beyond the readability of the Dockerfile.