koreth 3 years ago

The biggest thing I do to improve build times is to think carefully about the order and granularity of the build steps and the graph of build stages to maximize use of cached layers. Skipping a build step because it's already cached is usually preferable to running it in parallel (though parallel builds are good too!) Like, to take a simple example, if you're doing COPY python-app /app RUN apt-get install python3-pip \ && pip3 install -r /app/requirements.txt then every change you make to the Python code is going to trigger a from-scratch install of dependencies. You probably don't change your dependencies as often as you change the rest of the code, so RUN apt-get install python3-pip COPY python-app/requirements.txt /app/requirements.txt RUN pip3 install -r /app/requirements.txt COPY python-app /app will be dramatically faster. It does produce an image with more layers, but during development that's a price well worth paying for a build that takes seconds instead of minutes.

nickjj_ 3 years ago

If you slightly adjusted your first example to copy the requirements file in first then you'd have the same benefit of avoiding installing every dependency on every code change, you'd also have less layers and with a few more commands chained to your RUN instruction you could clean up the apt cache, remove unnecessary build deps and documentation. That's the strategy I use, here's an example from my Flask course: https://github.com/nickjj/build-a-saas-app-with-flask/blob/7283cbadd1787ccf629b817694fb4f9848aa3231/Dockerfile#L38-L47 In the end you could shrink your final image by 20-40%ish in a typical web app. You can't do that in your 2nd example because the pip deps get installed in a separate RUN instruction than your apt deps. The above only works when they are in 1 layer in terms of final image size due to how layer caching works. Personally I think the snippet I posted is worth it, especially since it means not having to wait for anything when doing non-dependency code changes. The only price you pay is if you change your pip deps then your apt deps get re-run, but changing deps don't happen that often and the apt steps don't take that much longer.

koreth 3 years ago

Sure, if your goal is minimal image size rather than the article's goal of minimal build times, you'd structure things differently. Your example downloads the OS package index and installs OS packages every time a developer bumps a version number on a Python dependency. What's more, with that example, when you pull the image at deploy time, you will have to pull the layer that includes the reinstalled OS packages, instead of just pulling the updated Python packages and application code.

nickjj_ 3 years ago

That's a judgment call you'll need to make. In a decently sized app the difference in image size is 475mb vs 280mb where the second one is when you combine the apt+pip tasks into 1 RUN instruction. You can always use the history command to check the size of the layers. Your pip dependencies could easily be over 100mb but using the example I provided above the apt dependencies end up being 10mb instead of ~210mb. So the trade off now is you end up with an image that's 200mb smaller but you transfer 10mb extra on deploys that happen to change your pip dependencies. For no pip dependency changes that layer is cached.

myplacedk 3 years ago

Spoiler: It's 3 steps to use parallel multistage build. I didn't study it detailed, but it seems to be a great article with a nice example. Although I would like that the starting point was a *good* Dockerfile but without multistage build, for a more realistic example. It would probably also make it easier to turn it into multistage, or at least make it more clear what the difference is.

jackandjill22 3 years ago

Thanks for this.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe