Do you know where I can read more about this? This is the first I have heard tha...

jacquesm · on May 18, 2015

Each 'container' is in effect an application plus all its dependencies.

So if one or more active containers could share resources then they won't, which leads to inefficiencies because you'll be running a much larger number of processes than you would otherwise (because of duplication) requiring a larger memory footprint and probably less efficient cache and/or IO utilization.

The deployment of the apps will be easier (which is a definite plus) but machine utilization will be lower and the amount of software running on a single machine will be far larger than otherwise, especially if multiple versions of dependencies are present on the same system.

A container is very much not a single process, it can contain many processes and some of those processes will likely duplicate components in other containers but without the resource optimizations that a kernel can normally perform.

vectorjohn · on May 19, 2015

Are you talking about shared libraries, that kind of deduplication?

Although true, that probably isn't really very significant compared to the vast wasted resources of idle dedicated machines. Which is hard to avoid without the vast wasted resources of a highly paid somebod(y|ies)

jacquesm · on May 19, 2015

Yes, but then rather than shared libraries the kind of de-duplication the kernel will do when it runs multiple instances of the same binary. This is (normally) very cache and IO efficient since it is done at the VM page level.

I also don't quite understand how one can reserve CPU cycles, memory and deliver IO guarantees without the same over-provisioning that you'd have to do using regular virtualization. After all, as soon as you make a guarantee nobody else can use that which is left over, so in that respect I see little difference between virtualizing the entire OS+app versus re-using the kernel (ok, that does save you the overhead of the kernel itself but that's not a huge difference unless you run a very large number of VMs on a single machine).

menage · on May 19, 2015

But you can make guarantees to the critical jobs (up to the total size of the machine) and then let batch jobs (with less time-sensitive requirements) run best-effort in the slack.

In the event that there ends up being no best-effort resources available on a machine for a significant period of time (because all the user-facing jobs are busy and using their guaranteed resources) Borg will shift the starving batch jobs to other machines that aren't so busy.

DannoHung · on May 18, 2015

Isn't this only true if your container build process pulls in the same version of a library in multiple different virutal filesystems? That is, if you are using the same base image for a number of applications and the libraries are installed in the base image rather than the image the application resides, in the kernel should recognize the shared libraries being used as coming from the same place and be able to perform deduplication as normal?

jacquesm · on May 19, 2015

Presumably you could arrange things in such a way that several container images shared libraries and such but that would likely interfere with the (desirable) isolation properties and versioning will play havoc with that anyway (since all dependencies are part-and-parcel of a container and nothing stops multiple containers from shipping different versions of the same package).

Where regular virtualization runs multiple kernels (which in turn will run whatever applications you assign to them) containers appear (to me, feel free to correct me) as a way to 'share a single kernel' across multiple applications dividing each into domains that are as isolated as possible with respect to CPU, memory, namespaces and IO (including network) provisioning and allowing multiple version of the same software to present at the time without interference.

The CPU, memory and IO provisioning can be thought of as a kind of 'virtualization light' and the namespaces partitioning should (in theory) help to make things a bit harder to mess up during deployment.

Leakage from one container to another will probably put a dent in any security advantages but should (again, theoretically) be a bit more robust than multiple processes on a single kernel with shared namespaces.

So I see them as a 'gain' for deployment but a definite detriment for performance because it appears to me we have all (or at least most) of the downsides of virtualization but of course you can expect both virtualization and containers to be used simultaneously in a single installation with predictable (messy) results.

I'm really curious if there is an objective way to measure the overhead of a setup of a bunch of applications on a single machine installed 'as usual' and the same setup using containers on that same machine. That would be a very interesting benchmark, especially when machine utilization in the container-less setup nears the saturation point for either CPU, memory or IO.

DannoHung · on May 19, 2015

Seems like you'd have to construct a pretty weird situation in order to blow up your cache, particularly if your load is enough that you can max out a server. Like, if you're running a lot of instances of the same app, deduplication should work fine, right? It only would show up if you've built a bunch of different applications that use the same libraries and consume roughly similar amounts of CPU; if you're running multiple copies of the same application, the deduplication should work just fine.

And that's assuming that it'd work exactly the way you're thinking.

I feel like the win over running VMs (which incur something like a 12% overhead compared to both Docker and running right on the machine for a single application), plus flexibility, plus ease of deployment is worthwhile. I mean, the current situation is running VM images anyway, right? This is a step in the right direction over that, even you must admit.

jacquesm · on May 19, 2015

I wished VMs would only incur a 12% overhead (that's assuming absolutely optimal configuration and a fairly static load), it can be substantially more than that, especially when people go 'enterprise' on you for what would otherwise be a relatively simple setup.

But you've made me curious enough that I'll do some benchmarks to see how virtualization compares to present day containers for practical use cases faced by mid-size and small companies, my fooling around with this about a year ago led to nothing but frustration, it's always a risk to argue from data older than a few months in a field moving this fast and more measurements are the preferred way to settle stuff like this anyway.

menage · on May 19, 2015

Most Borg containers were in fact single processes (or more correctly, a shell wrapper around a single process)

mixmastamyk · on May 18, 2015

It's not true, so I wouldn't bother looking too hard. ;)

The linux kernel does not "lose track" of processes/libs inside containers, they are simply namespaced, like a more extensive chroot environment.