> The packages—gcc and so on—needed to compile Python are removed once they are ...

vbernat · on Aug 20, 2020

The reason is to minimize the number of layers, mostly the layers that are not used in the final image (in this example, a layer would include gcc while another would remove it, but you would still need to download the layer with gcc).

claytonjy · on Aug 20, 2020

I thought non-final stages in a multi-stage build are left out of the final artifact; is that incorrect?

I'm thinking the "build step" would be done in an earlier stage; it could be the exact same RUN statement, or it could be split into multiple for readability, and wouldn't bother removing any installed packages, since they won't carry over to the final stage. Then the big RUN in the final stage would be replaced with something like `COPY --from=builder ...`.

verst · on Aug 20, 2020

EDIT: Provide example below

If you are doing multi stage builds it only matters to combined as many statements as possible in the last layer.

I agree that for clarity it is nice not to optimize layers in the build stage - those will be thrown away anyway.

I vastly prefer multi stage builds over having to chain install and cleanup statements

Example: I usually want to use the python:3-slim image, but this doesn't have the tools to compile certain python libraries with C extensions. Generally I will use the python:3 image for my build stage to do my "pip install -r requirements.txt" and then copy the libraries over to my final stage based on the python:3-slim image

Of course I could install and uninstall GCC and other tools in a single stage.. but that actually takes longer to do and is messier in my opinion.

jackthetab · on Aug 22, 2020

> Example: I usually want to use the python:3-slim image, but this doesn't have the tools to compile certain python libraries with C extensions. Generally I will use the python:3 image for my build stage to do my "pip install -r requirements.txt" and then copy the libraries over to my final stage based on the python:3-slim image

Example on how to do that, please.

itamarst · on Aug 20, 2020

For image size, it doesn't matter. It does matter for build time.

Specifically, multi-stage builds let you get better caching and therefore faster rebuilds, since you can cache the pre-installed gcc etc. layer, while still getting the small image.

So if you have a human being waiting on frequent Docker build results, yes, multi-stage is better.

In this case, the builds are automated, no one waits for them, so it doesn't really matter (except for burning some extra CPU cycles).

claytonjy · on Aug 20, 2020

That makes sense, thank you.

Is there no downside to multi-stage? Even aside from caching behavior I prefer multi-stage builds, as I'd much rather read & maintain a bunch of RUN lines which do one specific thing, rather than dozens joined with &&.

itamarst · on Aug 20, 2020

There isn't too much of a downside, except that you need to be a little more careful in your CI, otherwise you can end up rebuilding from scratch each time, thus losing the benefit of faster builds.

See here for why and how to fix it: https://pythonspeed.com/articles/faster-multi-stage-builds/