Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
1000 nodes and beyond: updates to Kubernetes performance and scalability (kubernetes.io)
240 points by boulos on March 28, 2016 | hide | past | favorite | 58 comments


We are running Kubernetes in production at FarmLogs and LOVE it. We're a very small team with a ton of operational work to do in other facets of the company as we prep for the season to begin, but once we've got some free time there will be an in-depth blog post describing our migration and roll-out. We've also built some really neat tooling that we would like to share with the world.

Upgrading to 1.2 has been incredible. Deployments are faster and pods get scheduled almost instantly now. Our Master nodes are down to about 1/4th of what they were normally doing in terms of CPU usage.

We're really excited to be ridin' on kubelets!


I would be very interested to hear your process!

Our team is looking at using it, but we haven't found a great way to do automated deployments with our current build system (Bamboo). The best we've come up with is a series of bash scripts as the deployment step, but I'm not fully comfortable with how that would handle failed deployments yet. Basically, we need a way to handle automated deployments, and see the status of our currently deployed systems / promote environments.

If anyone is using Kubernetes in production, I'd love to hear what your deployment process looks like.


For GitLab CI we just released deploy to Kubernetes https://about.gitlab.com/2016/03/22/gitlab-8-6-released/


We're looking at Spinnaker.io + Kubernetes right now for just this reason. K8s support was just added.


Disclosure: I work at Google on Kubernetes.

Please feel free to email me (aronchick (at) google) if you'd like to discuss this further (either P or GP post). We've seen a lot of this, and would love to help you out!


Awesome, thanks for the offer! I'll reach out in the next day or two! :)


It's exciting to see that Kubernetes is ready for basically any scale. You're more likely to run out of quota (on your cloud provider, particularly IPs) or some other resource (on-prem) before you can't schedule a container quickly enough.

Disclaimer: I work on Compute Engine and chat with the Kubernetes folks a lot.


That resource being money. I'm deploying a fairly simple app on GKE and things go out of hand quickly due to confusing pricing. Or maybe I just don't where to look.


Disclaimer: I work at Google on Kubernetes.

Can you say more? Did you just spin up too many nodes?


I would say that there's impedance mismatch between GKE pricing and unclear requirements, how much resources in what structure you will need.

I was looking at the Kubernetes tutorials and couldn't even start to figure out, how much would it cost to run them. (Well, I didn't try too hard, it wasn't that important.)


You probably meant to disclose that information, not disclaim it. I suppose this is one of those cases like "literally" where persistent misuse will cause the word to be its own opposite, but I keep fighting the annoying fight anyway.


In this case, I meant both (so I chose Disclaimer). The full combo is: I work on Compute Engine (Disclosure!), but I don't actually work on Kubernetes (Disclaimer!) though I do hang out with them (both?).


Fair enough! People certainly seem to do a lot of disclaiming of their credentials these days so I guess everyone is getting the picture anyway.


It's hard to understand why so many cloud service providers and software stacks are still v4-only considering the operational and development cost of v4+NAT (complexity, management, scaling limitations etc). For most systems it would be enough for the front load balancers to speak v4.


Funny, I see it exactly the other way around. I want NAT+Firewall to have at least decent perimeter security in a private LAN, the 10/8 subnet is large enough to do anything I can ever imagine to be doing and IPv4 is so much easier to grok. For most systems it would be enough for the front load balancers to speak v6.


Wanting a firewall is good, you can have that on v4 or v6. However, after you add 3 simple firewall rules, NAT provides no additional security.

   Drop state=INVALID
   Allow state=EATABLISHED,RELATED
   Drop all
You're now just as secure as if you had a typical NAT setup, but without the decades of kludges that is NAT.


Then you just have a system that has no security advantages but is harder to reason about due to the additional level of indirection in overloaded addressing (because of ambiguous addressing, management of forwarding rules, etc) vs normal firewalling without NAT. Which equates to some loss of security on the system level, because you can only effectively secure systems you understand.


I just want to give a public shout out to Wojtek on this blog post. It shows scalability in for an actual scenario at levels that most users won't need (10M req/s!). Beyond that, there is a clear methodology with lots of hard data. This along with listing the work that it took to get there. Very good post!

Disclaimer: I co-founded Kubernetes and help to coordinate the k8s Scalability SIG, although I'm no longer at Google. I didn't see this before it was published, though.


1.2 has a lot of really nice additions such as infrastructure containers, the new config map API, service draining for node replacements, and many more.

Unfortunately I would be running it on AWS and HA still hasn't been worked out and manual setup is a bear.


1.2 includes multi-zone support, so your nodes can be in multiple AZs. This means that a failure of a single zone shouldn't interrupt your apps: http://kubernetes.io/docs/admin/multiple-zones/

What is not yet in 1.2, but is planned for 1.3, is HA Master - so that failure of the zone which contains your master won't interrupt the control plane. (i.e. you will be able to update your apps even as zones are failing).


Ah, nice! That wasn't super clear to me but now that you mentioned it, perhaps it should have been.


Not your fault - I was a little slow on getting the docs written up!


Oh, cool. That's what I was looking for in the docs for the last 2 days.


The community is working on what we like to call "self-hosted" Kubernetes. This will help reduce the complexity of installation on all platforms. You can see more about it from my KubeCon keynote: https://youtu.be/A49xXiKZNTQ?t=6m The target is to have this all upstream in the next (v1.3) release.

Slides here: https://speakerdeck.com/philips/pushing-kubernetes-forward?s...


FYI, the SlideShare link in that video's description is truncated. I think it's supposed to go here: http://www.slideshare.net/kubecon/kubecon-eu-2016-keynote-pu...



Cool, thanks!


Awesome, will have a look. We run VPC per-environment, and support launching environments ad-hoc. So we would need to launch a Kubernetes cluster along with the environments. Anything to reduce the switching costs is very welcome :)


kube-aws is a tool that we built at CoreOS to make installation of kubernetes on AWS easier. We just made a new release (v0.5.1) and would love feedback on that. It is what we use in production here at CoreOS. https://github.com/coreos/coreos-kubernetes/releases


kube-aws is great! When do you expect to update to Kubernetes 1.2?


Running it on AWS is supported, and the team is making strides to make it even better! (Complain loudly via GitHub issues where you find problems).

Cluster Federation (a form of HA) is coming in 1.3.


Curious to know why they are choosing to go with protobuf for intracluster communications as opposed to zero copy protocols like capn proto or flatbuffers.

No doubt protobufs are probably much more battle tested in google scale environments, but are there any other clear benefits if the goal is to reduce spending cpu time encoding/decoding messages?

Especially in SOA deployments where many small services need to communicate with one another, I would think that the ability to quickly read any field from a message and pass it on (without first having to decode the entire message) would be a very desirable trait.


Protobuf is the Google standard, used by basically every single server at Google for the last 15 years. They have built an internal ecosystem of tools around the format. For a Google project to use something different would be weird and would face lots of internal push-back, for good reasons.

Even though FlatBuffers is technically from Google, it's from a sub-team of Android working on tools aimed at Android games. The idea was that you'd store your assets in this format. IIRC the initial release didn't do bounds checking so was totally vulnerable to malicious input (but it wasn't intended for such use cases anyhow). I doubt it is widely used on Google's servers.

Cap'n Proto is not from Google and there's simply no way they'd choose to use it. To be fair, its support for languages other than C++ remains weak, largely because Sandstorm.io doesn't currently have the resources to build it out.

FWIW the ability to read a single field from a message is less important in networking situations because sending/receiving the message is already O(n) and the messages are small-ish, so parsing in O(n) is not a huge deal. Random-access parsing really shines when the input is a massive file on disk.

(I'm the author of Cap'n Proto and also of Protobuf v2 (the first version Google open sourced).)



For those wondering what is being used for load generation in the demo: https://github.com/tsenart/vegeta


+1. Thanks tsenart for the awesome load generator!


The frame at 2:37 shows avg response time of 1.75 ms at 10 mln QPS. Which API call was measured? I'm looking at bar charts under "Metrics from Kubernetes 1.2" and the latencies graphed there appear to be different/higher.


"Metrics from Kubernetes 1.2" is discussing the latency of requests to the Kubernetes API (for managing what's in the cluster).

The latency referred to in the demo is the latency of requests from the loadbots to the nginx containers running in the cluster.


That's the nginx response time. You can see that when he scales up the loadbots but not the backends and says that the "tail latency has gotten quite high" (about 1min in).


Correct. In addition, the source code used to run the demo is available on github at https://github.com/kubernetes/contrib/tree/master/scale-demo


Thanks, all clear now.


Does anyone have a good experience to share with a hosted Kubernetes provider outside of GCE and Tectonic? I am primarily comparing using Kubernetes to alternatives such as Rancher or Nomad.


Disclaimer: I work at Google on Kubernetes.

Do you mean GKE?


My 'GCE' reference was to Google Container Engine with Kubernetes as the cluster manager, yes.


Funny story -- Google Container Engine was the obvious name for that product but the TLA for it (GCE) conflicted with Google Compute Engine. We broke the tie by deciding the TLA for Container Engine would be GKE. The 'K' is a nod toward the Kubernetes underpinnings.

Google Compute Engine itself was difficult to name. There were those that were pushing for Google Compute Cluster. But I veto'd as the TLA would have been GCC or GC2. Both would have been awful.

Naming is hard.


Why not go with Alphabet Cloud, or ABC?


Hahaha this is awesome! Unfortunately, Alphabet wasn't a thing back then.


Maybe abbreviations could be more flexible with G always having to be the first letter.


It gets a lot harder to name things world wide if you don't start with "Google". gmail in Germany was a lesson: http://techcrunch.com/2012/04/14/google-finally-gets-right-t...


Is gmail an acronym?


Rancher now supports Kubernetes and Docker Swarm if you are not already aware.


And still no way to make a simple 2-node cluster in 2 different availability zones. What if one AZ fails completely? Happens quite often.

I tried to read HA documentation on Kubernetes and it all starts with warnings like "this is fairly advanced stuff, requiring intimate knowledge of Kubernetes inner workings", and going on with pages and pages of setup process.

Basic HA is not a "fairly advanced stuff", it is a commonplace requirement in any production environment. Why do I need a 1000-node cluster if all 1000 nodes are in the same AZ, which can have an outage anytime?


Update: docs just arrived — http://kubernetes.io/docs/admin/multiple-zones/ . That's better :)


Conceptually speaking, having two nodes is not high availability, it is failover / fault tolerance. High availability is generally N + 2 where N is >= 1.

This is a better explanation than I would write on this:

https://www.quora.com/What-is-the-difference-between-a-highl...


You are right, of course. Still, I don't understand why it is so low priority in container orchestration platforms. And how it is even possible to live without it in production.


It's not low on the priority list at all. These are the same people who worked on borg (I'm a contributor, but didn't work on borg); they get stateful applications and understand that it needs to be done RIGHT. No second chances. Nailing this for 1.0 or 1.1 would have consumed a significant portion of the team, but rest assured it will work, soon.


Update2: persistent volumes are still allocated only in the same AZ with master container. Hence, no HA databases (only manual volume provisioning is possible).

I still wonder what is the primary use case for Kubernetes (or Docker Swarm, which has similar issues) if high availability is so low on the priority list.


Wasnt there an article recently about how 99.9 measurements can still hide lots of bad stuff with high volume services?

I seem to remember the article noting that 99.9995 was more useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: