The nature of contemporary web applications seems to be getting closer and closer to Erlang's original problem domain: lots of clients, continuously pulling down small updates, with a need for consistent and low latency. Maybe it's finally Erlang's turn to shine?
For those who are perfectly comfortable with viewing PDFs directly, here's a link:
www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
Interesting to see that they mention loving the hot code reloading/upgrades, but also that they don't use the OTP release/upgrade functionality.
I wonder what their process is.. Just coping over new .beams and loading them is fine for functional changes only, but you need a system to run code-upgrade hooks if you want to change the state being passed around.
As far as I know, you don't need releases and upgrade functionality to make use of the code-upgrade hooks. It simply works without them.
Releases and upgrades do make it simpler to synchronize what needs to be taken down in what order, restarted, left alone, etc. But simple updates are still doable without that.
But you still need a way to do that for running processes, something that release_handler does for you with real OTP releases by finding which pids are running for each module, by looking at supervisors and so on.
Right, the OTP upgrade system is best when you have few but very controlled upgrades of your application. If you alter the production system once every week, it is better to do something with lighter weight.
I guess it's Gb (Gigabit).
Since the only traffic Facebook Chat has is presence, which is handled using separate C++ servers.
Out of 500M users (350M active users) not many of them use chat, maybe because by design it feels disconnected from the rest of the site.
I wonder if FB would be willing to post some of their load numbers...
Like... how many sessions are created per day, persisted from day to day, how many messages sent per chat session, how long windows stay open, how many restores, etc.
You don't have to hire engineers with N years of experience in Erlang, rather hire great engineers who are programming language polyglots, and even if they don't already know Erlang they'll get up to speed on it fast, as well as any other new and useful language that comes along.
It makes sense if building a service in that language means that you need to hire far fewer engineers to maintain because it is much better suited to handle the particular demands of that application, as erlang seems to be in this case
Nice feature.. but when i clicked at the link and then clicked on 'Other' (bottom center), it says 'Use SSL/TLS: no'.
I wonder if the chat messages are sent unencrypted ?
Node.js is nice for evented I/O, but does it solve any of the other problems Erlang is good at, such as distributed parallel processing? How do you scale a Node.js-based service horizontally over many servers?
There is also the issue of Facebook chat being rolled out in 2008, which I'm assuming Node.js didn't exist by. Node.js would need to provide a significant gain in order to justify refactoring that system.
You could write facebook chat in pretty much any half decent language. The question is how competent the developers are rather than what tool they decide to use.
It's not rocket science, it's mainly plumbing - moving data around.
if you have unlimited resources and huge datacenters - maybe you can. Try to calculate have many nodes will you need to implement COMET-like chat system for 500M users, especially when you using Python or Ruby with some async IO evented framework.
I don't think that's the case. Facebook are the ones with 10,000 servers (probably a lot more now idk).
Some people are often too quick to try and emulate people/products by picking out some magical quality, using it, and expecting identical results.
It's just like women who latch onto fashion tips/diet tips from celebs - eg "Oooo Angelina Jolie used the X diet. If I use that, I can be like her".
Similarly, some techies think "Ooo facebook used Erlang. If I use Erlang, I won't have any problems scaling".
It's certainly more in how you approach things, and how you architect things than which particular tool you choose to use. There is no magic solution.
FWIW, I run Mibbit which handles a good number of users and messages per second on a few servers. Not facebook numbers yet, but not small either. I think we do a few billion messages a month.
There's also C++ implementations like boost::asio which would work and scale just fine too. There was a HN post discussing Alex Payne's blog on how node.js is "scaling in the small."
An ordinary Comet server with long-poll can actually end up eating quite some CPU.
I run beaconpush.com which is a cloud-based service for browser push messages. We're based on Netty and Java, which works out well for us. We've looked at Erlang and wrote some initial prototypes in it but Java became our language of choice. Mainly because we knew it and had some experiences with Netty, a NIO library for Java. Netty did also outperform Erlang and mochiweb (both vanilla configured).
Anyway, the problem tend to be with long-poll that users re-establish their connection back to the servers each time they receive a message or navigate to another page (or reload the page).
This end up taking a big toll on the system. We're to some extent limited to how fast we can perform accept() on our sockets.
If you use long-polling with multi-part you can get away with sending more messages on a single established connection (it becomes request/response/response/response or similar). That can reduce the system load and the use of WebSockets can eliminate the use of reconnections altogether (disregarding any page reloads).
Facebook's use of AJAX navigation (i.e not reloading the entire page when user clicks on links etc) also reduces this load. This due to not having to re-establish a connection each time a user reloads a page.
So yes, we're actually CPU bound by the accept() behavior (at least to how the JVM does accept()). But would our connection be more permanent in nature then no, we would of course be more I/O bound if not completely.
I'd do some more in depth checks. I find it very hard to believe you're cpu bound within NIO itself.
I do around 1k req/s per cpu and never see load avg above 0.5
>> "re-establish their connection back to the servers each time they receive a message or navigate to another page (or reload the page). This end up taking a big toll on the system. We're to some extent limited to how fast we can perform accept() on our sockets."
I'd expect what you're seeing is the overhead of your system creating new objects initializing requests etc rather than anything low level.
(Mibbit uses keep alive long polling XHR / websocket, using custom app server in Java +NIO)
It turns out that Java synchronizes on accept() capping the number of socket accepts that can made each second on a single port.
But even if that would be solved or if you used something more low-level like, say nginx, you'd still be limited of how fast you can accept().
Try using ApacheBench and hit one server with keep-alive on and off respectively. Even nginx, which I would say is a performant web server, show quite some discrepancies between having to reconnect each time and having a permanent connection.
I file this under the penalty of having to set up and tear down a socket each time.
Something that only can be avoided by doing less reconnecting.
I could of course be wrong but these are the findings from doing performance testing and discussing it on the Netty mailing list. But I'd gladly accept suggestions on how to improve this situation for our service.
I don't need more CPUs.
But can you find a single core machine nova days?
Most of the servers are 16 cores, which in case of evented framework will be heavily underutilized.
> "which in case of evented framework will be heavily underutilized."
I'm not following your logic here. If something doesn't need 16 CPUs, why is splitting it over 16 cores a good idea? You're going to be adding a lot of inefficiency, complexity etc for no gain.
Most of mine are dual or possibly 4 core.
One core for OS stuff, other processes etc, one core to do networking IO for me.
As I say, networking IO shouldn't be using CPU. If you're doing long or blocking jobs, pass it over to some other thread. (That's a far better way to utilize multiple cores than splitting the whole networking IO over multiple cores).
What you are suggesting here is exactly what Erlang provides for you. Each connection is a light-weight process inside the VM. The VM then schedules these Erlang processes against native threads. The difference is Erlang has perfected and proven that it can do this reliably over 20 years of development. You would be implementing this by hand, and guaranteed to screw up because its a very hard problem and you are not the smartest person in the world.
Hybrid solutions with Evented I/O and Multi-threading end up being the solution, Erlang is a hybrid out of the box.
I have implemented it. It works well. You don't have to be the smartest person in the world to not screw up elementary programming problems. Mibbit handles a few million users without issue on a few VPSes.
Erlang "doing it for me" isn't going to be anywhere near as efficient as me doing it myself.
I'm probably not the smartest person in the world, but I'm certainly smart enough to write a comet/web server that's fast enough.
I guess I'm not going to convince anyone here shrug. If you believe Erlang is a magic bullet great for you.
Totally agree. Since when did a work queue/thread pool thing become anything beyond intro to systems work?
Furthermore, I don't know where everyone gets off saying erlang is so fast either: I'm not sure where things are now, but last time I checked, their scheduling algorithms were leaving quite a lot on the table, standing to benefit quite substantially from even the most basic operations and logistics knowledge.
How much do you know about Erlang? Because, anybody competent with the tools of distributed systems would know that Erlang is one of the best tools for the job. You could spend a decade, or more, building the base level functionality that comes stock standard in Erlang.
If you're not using the right tools, for the right job, you're incompetent.