Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Docker's reliance on overlay filesystems is one of the biggest problems I have with Docker. Stacking opaque disk images on top of each other just isn't a great design, and it makes for a cache strategy that is all-too-often invalidated (because a Dockerfile is linear, there is no dependency graph). The file system is just the wrong layer of abstraction for solving such an issue.

If you change paradigms from the imperative sequence of mutations in a Dockerfile to a declarative specification that produces immutable results and a package dependency graph structure, you get a much better cache, no need for disk image layering, a real GC, etc. For example, the GNU Guix project (purely functional package manager and GNU/Linux distro) has a container implementation in the works named call-with-container. Rather than using overlay file systems, it can just bind mount packages, files, etc. inside the container file system as read-only from the host. Not only is this a much simpler design, but it allows for trivial deduplication of dependencies. Multiple containers running software with overlapping dependency graphs will find that the software is on disk exactly once. Since the results of builds are "pure" and immutable, the software running inside the container cannot damage those shared components. It's nice how some problems can simply disappear when an alternative programming paradigm is used.

https://www.gnu.org/software/guix/news/container-provisionin...



As an LXC user since very early days, I wrote something similar to but architecturally far more generic than docker, begun earlier (~201-2015). It's called cims. It was explicitly designed to be portable to *BSD as well as arbitrary storage drivers (docker added this later), cloud provider interfaces (VM only, AWS/GCE style full APIs, existing server with LXC or docker in place, etc.), and between arbitrary logical machine and service topologies. See docs @ http://stani.sh/walter/cims/ and early architectural forethought @ http://stani.sh/walter/pfcts/ Scope was arguably broader (ease of system administration, explicit need for existing CI/CD process integration), and to date I still believe the architectural paradigm is superior. Unfortunately my former employer decided to ask me to pay them to release the code, so I can't release it open source. But the high level docs have been open since ~early so knock yourself out on a clone. I may even write one myself in future.

Storage-wise, it had LVM2, ZFS, and loopback drivers. Never bothered with overlay, just used a generic clone API to the storage driver to spin up identical VMs before modification. Very easy with snapshot/thin-provisioning capable backends, like LVM2 and ZFS. Loopback just used cp, but because you could have these in memory they could also be very fast (but memory-hungry). Cloud-wise, we'd done a few but I was also iterating an orchestration system based upon internal requirements (high security/availability) and existing, proven solutions (pacemaker/corosync). It was designed for fully repeatable builds, something docker only began to add at a later date.


One can get very similar experience with Docker. The idea is to make single Docker image with all the software one needs on the server and start all containers using it with read-only flag. Then one use volumes for persistent data.


That's not really the same thing. What about different applications that share dependencies? With Docker, they would have to share an exact subset of a linear Dockerfile to take advantage of any deduplication, so you're back to image layers. It's telling to me that when proposed solutions involve completely avoiding one of Docker's primary features that maybe Docker isn't very well designed to begin with.

With Guix, any sub-graph that is shared is naturally deduplicated, because we have a complete and precise dependency graph of the software, all the way down to libc. I find myself playing lots of games with Docker to take the most advantage of its brittle cache in order to reduce build times and share as much as possible. Furthermore, Docker's cache has no knowledge of temporal changes and therefore the cache becomes stale. Guix builds aren't subject to change with time because builds are isolated from the network. Docker needs the network, otherwise nothing would work because it's just a layer on top of an imperative distro's package manager. Docker will happily cache the image resulting from 'RUN apt-get upgrade' forever, but what happens when a security update to a package is released? You won't know about it unless you purge the cache and rebuild. Docker is completely disconnected from the real dependencies of an application, and is therefore fundamentally broken.


Docker needs network only when one uses Dockerfile for deployment, a rather bad idea. Instead Docker images should be used. It allows to verify them on development/testing machine before deployment. And with this setup all "bad pieces" of Docker are located at developer's notebook. In production everything is read-only and shared among all containers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: