It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it's a really small amount of CPU but it's just one of many things which could easily be more efficient.
Some of this stuff is so simple and useful it's a wonder they weren't there before:
"mod_ssl can now be configured to share SSL Session data between servers through memcached"
"mod_cache is now capable of serving stale cached data when a backend is unavailable (error 5xx)."
"Translation of headers to environment variables is more strict than before to mitigate some possible cross-site-scripting attacks via header injection."
"mod_rewrite Allows to use SQL queries as RewriteMap functions."
"mod_ldap adds LDAPConnectionPoolTTL, LDAPTimeout, and other improvements in the handling of timeouts. This is especially useful for setups where a stateful firewall drops idle connections to the LDAP server."
"rotatelogs May now create a link to the current log file."
It frustrates me when people use ASCII instead of packed bitmaps for things like this (packet transmitted once a second from potentially hundreds or thousands of nodes, that each frontend proxy has to parse into a binary form anyway before using it). Maybe it's a really small amount of CPU but it's just one of many things which could easily be more efficient.
I think that's a premature optimization. If it becomes a performance problem, optimize it then. Otherwise, I doubt it's worth the cost of humans not being able to read the information on the wire, and deciding on and implementing a binary format to represent the information.
That's a fine philosophy for the just-ship-it model of startup webapp development, but this is a piece of server software which is supposed to scale high and far and reduce its resource use as much as possible (part of the emphasis of the release as noted).
If you're writing a piece of some software with one specific function and it doesn't affect anything else and it doesn't take any more time to make it efficient, just do it efficiently the first time. This way you don't have to come back later and rewrite it (which by the time that's required, somebody already wrote something dependent on your crappy design, which now has to be fixed too, etc)
It might be the philosophy of "the just-ship-it model of startup webapp development", but it's not why I have it. It comes from the philosophy of high performance systems research where you can spot inefficiencies all over the system, you just don't have enough time to optimize them all, and any extra complexity you put into the system better be worth it. Every "optimization" you put into your system better have performance numbers associated with it; if you can't experimentally demonstrate that the optimization is worth it, it's not.
Moreover, as someone who's spent a significant chunk of my career basically being a professional optimizer, one of the worst things to fight against is code infected with a plethora of micro-optimizations. Micro-optimizations tend to be based around fragile, case-specific assumptions and when they're used, a smarter, deeper optimization to a more generic version of the problem isn't reused because somebody decided that "every bit counts".
100% agreement on "if it's not showing up in your profiler, don't optimize it".
I agree that if you try to optimize everything before proof that you need it you'll be wasting a lot of time. But the case i'm talking about is a function in mod_heartmonitor.c which calls apr_strtok, strchr, and ap_unescape_url(x2) for three iterations and returns the values in a table. And that function is duplicated at mod_lbmethod_heartbeat.c. For a cluster of 850 servers that's 10,200 function calls every second. When they could have just copied a raw network-order bit string into a struct. Or something.
Yeah it's probably insignificant, but they could have just done it that way the first time. There'd be no optimizing needed thereafter, and no side-effect as it's only used in two places for a limited function. It's actually smaller and easier and faster to write AND to run. You make the right choices at design and implementation time and you come out with better code which doesn't need to be optimized.
It parses that string and reads it into a struct at a rate of about 3 million per second on a single core on my Macbook Pro. That means for your example there of 850 machines in a cluster, we're talking about roughly a quarter of a millisecond of one core of CPU time. Even if that's off by an order of magnitude, it doesn't matter.
THERE IS ABSOLUTELY NO REASON TO OPTIMIZE THIS CASE
The enemy of optimization is hard to maintain code, not small inefficiencies.
Using text in this case makes the wire protocol easier to debug and generally easier to maintain for forward compatibility. Need to add an extra argument? It'll be both self-documenting and backwards compatible, instead of some versioned mess of bitflags.
Could I be wrong? Sure. There's an easy way to prove such: show me the profiler output. Your eagerness for micro-optimizations significantly hints at inexperience.
Assuming that it'd perform that way in the real world, you are right that my solution would have no significant performance increase over the sscanf you use. But I don't totally agree with some of your supporting arguments.
The wire protocol you're talking about is a heartbeat protocol. This doesn't need to be human-readable because there is no input from a human, nor is it intended for a human to read. Debugging it would be as complicated as a "printf". Oh the horror. We wouldn't want to add debugging to our application - better make people read the raw data with a packet sniffer (which you can't run unless you have root, so good luck getting your application debugged quickly, developers).
Adding an extra argument would require appending an extra field and incrementing the version. Oh noes the mess! Since both solutions are versioned, forward compatibility is just updating code in the server, and if you're going to change things you have to update code anyway, so this doesn't sound like a good reason to oppose it.
Realistically, will more than a handful of shops have a big enough cluster for any of this to matter? No. But even your example of using sscanf is faster than Apache's code (http://pastie.org/3430085) while being 100% compatible with the rest of the code, and still goes to show that doing it right the first time is better than just slapping something together and waiting until you have to optimize.
So do we need to use an unreadable, simpler solution? No. But it would work just as well as anything else and take just as much time to write - if not less.
I would happily co-sign on wheels' post, but I wanted to say something about this: Yeah it's probably insignificant.
If you find yourself saying that, then your work is done. There's nothing to optimize. Optimized code is non-idiomatic, usually clever, often obtuse, and as a result, more buggy. Sure, copying the data off the network is easier, but you still have to figure out an encoding on the sender side, document it, implement it, and probably in the future extend it. There is a very real cost to having around "optimized" code, which is why you had better be sure that the performance gains outweigh the software engineering costs.
You must be confused. I never suggested optimizing the code. I suggested using an initial design which is inherently efficient. They could have also made the heartbeat packets into XML documents. They didn't because they're not totally insane.
You are suggesting an optimization over the default "just use text." Your initial design is more complex and error prone - in my experience, people screw up simple binary formats much more often than simple text formats.
For applications which deal with users, yes, but this is like telling NFS it needs to transmit permission bits in english. Computers do not need english languages to communicate with one other - only people do.
And since the data being communicated between the heartbeat client and server is computer-generated and automatic, "text" is not even not required, it is an unnecessary step in communication. Apache is implemented in C. C is good at accessing, copying, transmitting, and inspecting raw memory. It is not good at parsing and comparing strings. Adding text to a protocol nobody will ever interface with is adding an unnecessary layer which is not only more complex than my suggestion but is itself more prone to security flaws and bugs unless proper care is taken when parsing the strings.
Really I think this is about people's comfort levels. You feel more comfortable looking at a string and knowing what's in it. Either way will work; it doesn't matter how you skin this cat. But I think it's disingenuous to suggest that comparing raw bits is somehow more likely to blow up than comparing the raw bits after you parsed them from a string.
I think you are underestimating the complexity of serializing data. Designing, implementing, documenting and extending a format for transmitting raw data is, at the least, troublesome. In my experience, people get confused more easily when they have to remember how the endianness of the processor versus the network effects what order data are in - and you have to start thinking about these things once you have decided to transmit raw data. My claim is that this is more effort than just using text.
Further, I think you are underestimating the benefits of people and other programs being able to read the messages without having to read up on, and reimplement such a format.
Once I rewrote a packet sniffer (without libpcap) to support multiple architectures for fun. I never took the time to learn about endianness until I had that project to play with and i'm glad I did. Would I like to program like that all the time and implement every new protocol as a bitstring? Hell no. But sometimes you're handed an edge case which is just perfect.
A protocol which transmits data that dynamically changes according to the use of the application servers, which is implemented by services using a language which makes it easy to communicate raw memory and not need to parse it, passing a tiny low-latency message over an unreliable transport layer without even a concept of whether messages are ordered properly or not.
Forget the common developer. Forget compatibility. Forget usability. Forget reliability. This thing only has one function: tell the frontend proxy i'm alive and how many slots are active or busy. The only thing that matters to the backend server is how fast you can spit out a packet, and to the frontend how many machines are alive. Shit, there might not even be a valid checksum on the udp packet. And you're worried about how much effort it takes to implement this? It probably takes less time to write the code than it does to add the source into your makefiles and make a unit test. The existing function that parses these packets is one small function, which without parsing a string would be one memcpy (or the slightly-less-flexible sscanf as shown by wheels earlier).
There is no crime in writing apps in a way that fits their use. Not all apps are the same, and not every "default" design decision is appropriate. Sometimes you just do what you're familiar with. In this case the Apache guys used an HTTP query string for a heartbeat packet. I would have used something more akin to a network protocol packet. It doesn't really matter as long as it meets the application's requirements and works.
Every 1 second, this module generates a single multicast UDP packet, containing the number of busy and idle workers. The packet is a simple ASCII format, similiar to GET query parameters in HTTP.
An Example Packet
v=1&ready=75&busy=0
Consumers should handle new variables besides busy and ready, separated by '&', being added in the future.
I think that last line is why they avoid a binary format. Everyone who's consuming this packet already has a function available which can parse url query parameters with arbitrary name/value pairs, and that's the format being used here. I'm also not sure that the user needs to turn this into a binary format before using it; a string comparison against "75" and "0" is just as capable and nearly as fast as a numeric comparison, especially if you're using a scripting language for your heartbeat monitor rather than C.
Developers that are so obsessed with micro-optimizations that they can't stand to have a plain-text heartbeat packet don't get to have their software deployed on my watch.
I can buy more hardware, but operating and maintaining systems that have been optimized to hell and back for no compelling reason is a massive ongoing human resources cost, not even counting the initial unnecessary development time.
It actually took more time to write it plain-text than it would have to unpack a bitstring into a struct. With or without optimization it cost you more human resources (in terms of time and money) to do it their way. As you add more hardware (in cluster nodes), your CPU time on each proxy node needed to perform this operation every second will increase.
My original point was not to encourage over-optimization, but to design better. But deploy whatever you want, I don't care.
It's harder to debug and extend a bitmap, and in the real world, single service instances don't scale infinitely, more layers are added. You will never have enough back-ends behind a single proxy for the time spent parsing to be measured.
This is a textbook premature optimization. You're designing for academic purity instead of the real world.
I think you can solve this by using http://n0rp.chemlab.org/vlogger/ You can define one log which has the virtual host as the first field and split the logs after it.
It may sound trivial, but the thing I appreciate most about Nginx is its lightweight config file syntax.
It's very easy to glance over and see what's been set up compared to Apache's verbose pseudo-XML syntax which is about the worst syntax you can come up with: the verbosity of XML but without the benefit of being able to generate or parse it using standard XML tools!
Yep, it's utterly dire. People bitch about X11 config files, but this is in many ways worse because it pretends to be XML. It'd be nice to submit a patch for Apache so it can read YAML, XML, and JSON based configuration. Or even just some sort of consistent fucking format, so we can generate configs for a bunch of servers with lxml or something.
The other thing that fucks me off is that the server will reboot happily and fail to start without having first checked that it can start, and configtest doesn't pick up things like directories being missing.
Nginx config gets pretty hairy as soon as you try and do anything unforseen.
Just read http://wiki.nginx.org/IfIsEvil for a start. Lots of things can only be done with ifs (unless you write a custom module) The "What to do instead" doesn't work for things that aren't simple rewrite rules.
There are seemingly arbitrary rules for what contexts you can have conditionals in, what commands can be in conditionals. There is no support for ands, ors, or nested ifs.
The biggest drawback to nginx is that changing modules requires a recompile. This is especially annoying as nginx doesn't support HTTP chunked uploads (downloads are fine). Chunked uploads are used when you don't know the size of something in advance. There is a module that hacks a solution, but very annoying to have to do the recompile.
I used Apache for years and then switched to Nginx and used it for about a year and a half. I've been bitten by all sorts of weirdness just trying to figure out how to do very basic things in Nginx, where configuration bits go and weird idiosyncrasies of the syntax and how it's parsed.
I guess real world testing is going to be the best indicator of whether this is a worthy release or not, but i sure am very glad that Apache have at least attempted to up their game. Even if they are not able to deliver on their promise, the effort is noble enough - at least for now.
Personally, i'm very glad to see performance considerations being taken seriously, and even if nginx or node.js don't take over the world, its nice to see that they're forcing others to sit up and think.
That's the wrong mindset to have, IMO. Just because the incumbent sysadmins have gotten used to the eccentricities of Apache doesn't mean having something easier to configure is not necessary. This is similar to what people said when the iPhone was announced.
There's a big difference between your two scenarios. At the time the iPhone came out, only a very tiny fraction of people had ever owned (or even used) a smartphone. The iPhone's success came because it was able to grow the market, convincing feature-phone users to upgrade to a smartphone. I don't see the same thing happening with nginx. I don't see anyone saying, "Gee wiz, I didn't want to be a sysadmin, but now that I've seen nginx's easy config file syntax, I want to be a sysadmin now."
I don't know which distribution you're used to, but in many modern Apache setups, the config has been split up into multiple smaller config files each with their own purpose.
The average user only ever has to access one small file with 10-15 lines.
A great deal of he complexity comes from OS vendors doing nigh-unspeakable things to httpd.conf. The default httpd.conf from Apache is a bit gnarly too, but compared to what redhat or ubuntu do... not cool.
The mandatory configuration is actually pretty minimal, especially if you are using it as an app server. Take, for example, http://kasparov.skife.org/blog/src/wombat/httpd-conf-cool.ht... which sets up a bunch of mod_wombat (precurser to now-bundled mod_lua) stuff.
Plan 9 is a really, really bad comparison. If you really try Plan 9, you'll find a much better reason that it isn't as popular as it "should be": Absolutely no one has done anything to make it easy to install or pleasant to use. I'd say it reminds me of Linux circa 1996, but that would be pretty damned insulting to Linux.
There has never been a concerted effort to move Plan 9 beyond its roots as a research platform. The real world doesn't run on research platforms, and those of us trying to get real work done aren't going to invest our time in a research platform that needs enormous work to bring it up to a usable state.
I think his point is that NGINX gained in popularity even when its docs were still awful because the software was supposedly "just that good." Ergo, a bad first impression is an important part of getting software adopted but really good software can overcome that hurdle.
The Mapnik GIS software (and the stack around it) used to be similar: a total pain in the ass to compile and configure, but people used it anyway because it was the best thing out there.
Mind you, I think these are real exceptions, and don't agree with the implied idea that good software will thrive regardless of how hard it is to work with.
nginx doesn't exactly need much documentation, but it didn't get popular in spite of poor/absent documentation, somebody put in the effort to start writing documentation so a wider audience could put it to use.
Nobody has put effort into Plan 9's usability (which, by the way, is a much bigger problem than documentation issues), which was my whole point.
You know you have a good software when people begin to procrastinate on your software by comparing the number of requests / second that your server will accept, knowing that 99.9% will never reach that.
Oh and ... Node.js is probably going to be mentioned.
If a server can handle a larger number of requests that the same server with different web server could this automatically means resource usage per connection is lower.
This is good for everyone ranging from small site to large because it means reduces costs etc.
<strike>Jesus Christ. I've apparently flown into cuntytown. Cleared for landing? I HOPE SO</strike> Ruining the signal/noise ratio. Sorry. And thanks to the one person who took the time to explain.
I'm pretty sure the Apache httpd server doesn't follow semver at all, given it predates that document by 10 years or so. Any overlap is pretty coincidental or inspired in the reverse direction.
Hmm . . . I thought HTTPd used the even/odd convention for stable/development at the minor number level, which I don't see mentioned in that doc. And that would go against semver altogether.
"While it seems unlikely that NGINX could overcome Apache’s commanding lead ..." Oh zdnet, don't you realize that actions like this release from Apache, obviously under pressure from NGINX, are precisely the indicators that you will be eating your words in a year or two?
I don't think they'll ever need to eat those words. It does indeed seem unlikely that NGINX will pass Apache. Not that it can't happen, but reasonable people would agree that such a thing does seem unlikely.
Apache is more than sufficient in handling the traffic of the majority of sites on the net. Which gives very little reason for those sites to switch (or for that matter, upgrade to the new version). Which means ZDNet is right on the money. It is extremely unlikely that anything will overtake Apache in the next few years. As for 10 years down the road? I doubt it'll be Apache in the lead. I also doubt it'll be NGINX.
Apache might very well be under pressure from NGINX, but getting out a new major release (and after so many years), is DEFINITELY NOT a sing of this.
Much less is a new release indicator that Zdnet will be "eating their words in a year or two" regarding Ngix/Apache market share.
On the contrary, if this release DOES improve performance a lot and reduces memory usage, on top of all the other savings, it would make it even LESS possible for NGINX to win over Apache.
Besides raw performance, there are lots of reasons to use Apache still, from the fact that it's a battle tested server with tons of documentation, know best practices, tools and modules support, knowledgable admins etc available for it. So, if performance is improved, many people won't bother switched that otherwise might have.
A counter to this is all the people that have switched to NGINX are not going to "jump ship" back to Apache. I'm one of those converts and while this may be great for people who NEED Apache, most people don't need it and may have already made the switch to NGINX.
Except that most people did not make the switch. Apache continues to hover around 65% of the market while its next nearest competitor, Microsoft, hovers around 15%. nginx is below 10% and climbing very slowly.
Well, I have switched to NGINX for some projects, and am considering the switch back if Apache gets a few things nailed down (this release is a move in the right direction).
The main reason is: I don't care for that much performance in the raw one standalone server case, and I can always put Varnish on top. But I do care for easy of installation, breadth of documentation, etc, and with Apache you got that in spades. This can be in even very simple things, that I can solve in Nginx in 15 minutes, like, say, Wordpress having specific rewrite rules and support for Apache built-in and not for Nginx. It's trivial to solve, but if I used Apache, I wouldn't even have to.
Given two platforms, one of which is better than the other but less popular, I usually stick to the most popular one (within reason. Like, I won't go for PHP, but I would go with Rails, not Padrino or some even less known thing).
Fewer problems down the road, and if you get into those, people have already encountered them.
As compared to what? The mind numbing Apache.conf?
I am using and administering it from 6 years, seems pretty much okay to me if you use the .NET stack. If 18% of the top million sites are using it over completely free alternatives, they must be doing something right.
Maybe it's because I've been using it for 15 years, but I have never thought of Apache's httpd.conf as mind-numbing. Sometimes the terminology is a little arcane, necessitating an extra few Google searches, but it's always been completely logical to me.
And people stay with it because it would be so much pain to migrate to anything else. Just imagine rewriting all that VBScript code... That's mostly why companies kept IE6 for so long.
Other than the odd exceptions, people staying on IIS has little to do with VBScript or other legacy code at this point. In the enterprise I would wager the two largest IIS hooks are .net and sharepoint, neither of which have anything to do with VBScript.
You know how it works. You already have a shop full of people trained to work your legacy apps, with their shiny MC* certifications, comfortable in their safe lifetime dead-end jobs... What do you think they'll do? Use the tools they have been using for a long time now or explain management (because they really don't decide anything) they need to learn to use better tools?
Do you really think most MC* people are eager to move on to tools that make their certifications worthless? When they move, they do along Microsoft's designated path and rarely stray from it.
>Use the tools they have been using for a long time now or explain management (because they really don't decide anything) they need to learn to use better tools?
Better tools like what?
Maybe there are many people out there that think that ASP.NET/C#/MVC and Visual Studio are actually better tools for them?
I know that could be an alien concept around these parts and for you but that doesn't make it any less true.
Saying that they should discard their knowledge and move to Ruby/PHP/Node is as idiotic as saying Ruby developers should ditch Ruby and move to Visual Studio/C#/.NET since it might be one of the best platforms around.
>Do you really think most MC* people are eager to move on to tools that make their certifications worthless? When they move, they do along Microsoft's designated path and rarely stray from it.
Microsoft has been building up support for Python, PHP, Node and github. Anyway, the reality is nothing close to the dystopian light that you paint them in. The jobs are no more dead-end than Java or PHP or Ruby jobs.
And no, sorry, VBScript is barely around in maybe in around 5% of companies running IIS, people have moved on to new technologies, unlike constant MS bashers who seem to be stuck atleast a decade back in their criticisms.
> Ruby developers should ditch Ruby and move to Visual Studio/C#/.NET since it might be one of the best platforms around
Except that VS/C#/.NET isn't one of the best platforms around unless you code for Windows. And that's one more reason not to move to other platforms - because the tools they use don't support the alien technology as well as what they've been using.
> The jobs are no more dead-end than Java or PHP or Ruby jobs.
You obviously have a different idea of what constitutes a dead-end job. I imagine it's a job at a company that thinks of IT as a cost of doing business, something competitive advantages are not to be derived from. Those companies will not invest in new things until everybody else is doing it and hire the same kinds of professional other companies hire. They use certifications instead of interviews because then the whole hiring process can be done within HR. I've seen a lot of them.
> VBScript is barely around in maybe in around 5% of companies running IIS
I know it's not a thorough review, but I see plenty of .asp URLs around within corporate confines.
That reminds me of a prank I did with a couple friends. We had a site running on Zope and we decided to use every known extension for our URLs. I think we got to a dozen different ones.
You misquoted me, I was saying that it is absurd to ask people to switch platforms on a whim.
>Except that VS/C#/.NET isn't one of the best platforms around unless you code for Windows. And that's one more reason not to move to other platforms - because the tools they use don't support the alien technology as well as what they've been using.
Code for Windows? As oppposed to what? Code for the Mac or Linux? These days most of the effort is in coding for the web.
And it doesn't really really matter to the user if the website is running on Windows Server, Linux or BSD.
>You obviously have a different idea of what constitutes a dead-end job. I imagine it's a job at a company that thinks of IT as a cost of doing business, something competitive advantages are not to be derived from. Those companies will not invest in new things until everybody eles is doing it. I've seen a lot of them.
Sure there are, but if they ran on Ruby or PHP, they would be doing the exact same thing, I don't see how IIS is relevant here. The companies want a well supported product, with an available developer base and some of them choose the .NET stack based on MS' really long support cycles.
>I know it's not a thorough review, but I see plenty of .asp URLs around within corporate confines.
If you're seeing more ASP URLs than ASPX URLs, I would say your corporate selection is skewed. There have been and continues to be a massive number of migrations away from asp over the past ten years.
Go compare the number of job listings on Dice for classic ASP developers vs. ASP.NET, it's not even a contest.
> it is absurd to ask people to switch platforms on a whim.
Since when exploring and using different, possibly better, technology is "on a whim"?
> Code for Windows? As oppposed to what? Code for the Mac or Linux?
I don't think Mac is a popular platform for running server applications, but I am sure Linux is a very relevant one, if not so popular in environments that reject "new" technology.
> And it doesn't really really matter to the user if the website is running on Windows Server, Linux or BSD.
Actually, it does. If your choice of technology implies higher prices, longer time for bug-fixes or added features or lower reliability, it directly impacts user experience. If your choice of technology fails to attract the best developers, software quality will suffer. That will impact user experience.
> The companies want a well supported product
If they are exploring competitive differentials through the adoption of newer technologies, they'd better realize it's not possible.
> with an available developer base
This certainly impacts the price of your labor. If you chose a technology that has lots of developers readily available, you'll be able to offer them lower compensation.
> Go compare the number of job listings on Dice for classic ASP developers vs. ASP.NET, it's not even a contest.
Job listings reflect open positions, not the number of people using a given technology.
http://httpd.apache.org/docs/2.4/new_features_2_4.html