First, the lack of confidence that "localhost" actually resolves to the loopback...

deathanatos · on Aug 7, 2017

> and that breaks everything

If you're on a POSIX system, I'd argue that this is a bug in the client. Typically, the client should call getaddrinfo(3); as part of that, the application would either specify directly that it's only interested in AF_INET results, or just filter out non-AF_INET results.

(Further, if you support IPv6 in the client, and thus request such results from getaddrinfo, you should skip to the next result if the connection fails.)

On the server, you can also bind to both the IPv4 and the IPv6 addresses. If you listen to ::, you should get IPv4 connections too. (Through this[1] mechanism.)

[1]: https://en.wikipedia.org/wiki/IPv6_address#Transition_from_I...

ploxiln · on Aug 7, 2017

I agree about getaddrinfo(). Applications should have specified AF_INET to gethostbyname() and then when updating to getaddrinfo(), handle ipv6 correctly or skip it.

The IPv4-mapped IPv6 addresses thing is a terrible idea that ends up turned off everywhere it makes a difference. Like all of these "transition" ideas, it tries to help, but it just hurts by causing admins/ops/devs to just make sure ipv6 is off everywhere they run into it.

If all IPv6 was just opt-in everywhere, we wouldn't need all significant applications to detect and work-around broken ipv6 (in the local network, in the ISP, in the server, in the peer application). If IPv6 was only explicitly enabled, and fully consciously handled when it was, then practically everything would have opted into IPv6 years ago. As it is, things started getting IPv6 support around 2005/2006, and then disabling it around 2007/2008 because it was fake/broken ipv6 in too many places (windows teredo, 6in4, etc).

OS X had to disable IPv6 in one release and then re-enable after implementing happy-eyeballs in everything. Firefox had to do that. I know about these cases in particular, but I bet all significant network applications had to have a bunch of "detect broken IPv6" code added. Major websites had to disable IPv6 on their main domains (and sometimes added an "ipv6." subdomain). The reason was that some people had broken IPv6 and their computer would try to use it to access their site and fail (but other ipv4 only websites worked). Things are getting better now, but it was a 10 year set-back. So much time and effort could have been saved if people with good-intentions didn't add automagic transition technologies and just waited for IPv6 to be explicitly added on both ends.

pas · on Aug 8, 2017

Transition mechanisms suffer from the same problem as the upgrade to ipv6, the need for an upgrade.

So in this sense all were doomed to fail.

Remarkably the CGN devices are actually the best practical idea for this. Fake IPv4 addresses for the dumb ossified clients and that's it. If something important enough for the end user doesn't work, they'll finally upgrade.

wfunction · on Aug 7, 2017

Is it fair to say IPv6 has been generally a failure? Or is it too early for that?

splitrocket · on Aug 7, 2017

I'd say this growth curve looks healthy and robust to me: https://www.google.com/intl/en/ipv6/statistics.html

wfunction · on Aug 7, 2017

Yeah you're right, it certainly does. I would be curious if it goes mainstream enough to replace IPv4 in the foreseeable future though, which was its intention.

rnxrx · on Aug 8, 2017

It's been quite mainstream in certain contexts and geographies for a number of years now. As an example - most handsets on modern LTE (and newer) networks have been strong-majority v6 for quite a while. The fact that this hasn't been obvious is an argument in favor of v6's success.

wfunction · on Aug 8, 2017

Hah, I didn't know precisely because in my case I've always seen an IPv4 address on mobile...

topranks · on Aug 8, 2017

I'm not sure that's the case. Certainly not in Europe anyway, although it is seeing wider adoption now (particularly 464XLAT based solutions).

tialaramex · on Aug 8, 2017

The expectation is that networks will switch to IPv6 only internally, and eventually the IPv4-only remainder of the Internet decays until it's no longer an "IPv4 Internet" but just a handful of separate IPv4 networks that are connected to the (now IPv6 only) Internet by protocol converters.

Some US corporations did this already, rather than fuss with being "dual stack" and potentially introducing new IPv4-only services or systems, they switched wholesale to IPv6 and add converters at the edges. By choosing to do this they get most of the benefits of a future IPv6-only Internet today. For example, numbering internally is a breeze, they can auto-number almost everything because the address space is so vast there is no need to "plan" any of it.

Lots of other US corporations are still IPv4-only, indeed that's why the Google graph earlier has a distinct weekday vs weekends / holidays step change in it. At home a very large proportion of people in industrialised countries have IPv6, major ISPs supply it, common household routers understand how to use it, every modern OS groks it. But at work IPv6 is often disabled by policy, in favour of cumbersome IPv4 because that works and changing things at work is forbidden.

gm · on Aug 8, 2017

All that's needed is for Google to make it factor in search ranking and you can bet that we'll all be finally reading up on ipv6 and how to make it work well on our servers, and testing the hell out of it :-)

ploxiln · on Aug 8, 2017

It's "too needed to fail" - and there's nothing to supplant it.

And it's finally starting to catch on, 10 years late: Google's primary web domains, Facebook, AWS, Comcast and Time Warner cable internet in the US, most LTE cell service in the US.

toyg · on Aug 7, 2017

It's now embedded in huge chunks of internet so I wouldn't call it a failure. The transition could and should have been handled better, and the specification has its flaws (too machine-oriented) which unfortunately will never be fixed, but it's here to stay.

wahern · on Aug 7, 2017

The mechanism you alluded to (dual binding--receiving IPv4-mapped IPv6 addresses on an IPv6 socket) requires explicitly disabling the IPV6_V6ONLY option on each socket. Some systems have IPV6_V6ONLY as the default; I think modern FreeBSD releases do this out-of-the-box. I don't think many Linux distributions enable IPV6_V6ONLY by default, but administrators can enable it globally, necessitating a per-socket reversion.

Some systems, like OpenBSD, don't even support disabling IPV6_V6ONLY and therefore don't support dual binding at all. OpenBSD contentiously argues that dual binding is likely to lead to security exploits as applications that naively bind to "::" might not expect to de-queue IPv4-mapped IPv6 addresses, consequentially possibly breaking their local access control logic. For example, they may setup firewalls rules that restrict access to the IPv6 port but forget to set the same restrictions on IPv4 ports.

I'm not sure I agree with OpenBSD's approach, but in any event applications should explicitly disable the IPV6_V6ONLY socket option if they're relying on dual binding. Ideally they should use two different sockets; one for each address family. If the application stack doesn't make it easy to listen on multiple sockets, that's a strong hint that the design is broken.

kbutler · on Aug 7, 2017

Does it matter if it's a bug in the client?

Make it work like expected.

Especially when changing infrastructure like IPV4->IPV6 - don't break existing userbase code! (This is a fundamental precept of Linux development.)

frubar · on Aug 8, 2017

Only in software development are we expected to go out of our way to support people who don't know what they're doing. Imagine a medical student complaining the professor that the scalpel doesn't cut on the dull end or a would-be airline pilot upside down in their seat complaining he can't reach the controls.

And your comment about never breaking existing userbase code is as ridiculous as it is unreasonable. This would only be a reasonable expectation if only correct code were possible to run. Since it's possible to run broken code (as this anecdote demonstrates) then it must be possible to fix the system even if that breaks some poorly written, but popular systems.

Does anyone want a world of e.g. Windows where decades old bugs have to be replicated because we can't dare break programs so old all the authors are retired or dead? I don't. Backwards compatibility has a cost and I'm not willing to pay it unconditionally.

silotis · on Aug 7, 2017

There's only so much the kernel can do to protect userspace from itself. When you have an interface which returns an explicitly extensible data structure, you have to assume userspace is going to at least ignore extended data it does not understand. Otherwise you cannot have such interfaces at all.

pas · on Aug 8, 2017

> Make it work like expected.

Expected by code or by the progammer(s)?

That said, strong API guarantees are the way, documentation should note the bug, introduce a fixed version for the API, maybe schedule deprecation in 10-15 years, and carry on with life.

It's not worth it to hunt down every client and make them fix your honest mistake.

klodolph · on Aug 7, 2017

Agreed that it's a bug in the client, but these bugs are going to be ever-present in the transition to IPv6.

tedunangst · on Aug 7, 2017

Hello me from last week. Had exactly this bug, sometimes nginx couldn't connect to the backend (but very rarely, and not reproducible on demand), which I eventually tracked to the fact that localhost sometimes resolved to ::1 instead of 127, which is what the backend was listening on. Still don't understand why it was only like 1 in 1000 requests, and not every or every other request. Just one more slice of ipv6 mystery.

jacquesm · on Aug 7, 2017

I've had weird errors like that where two DNS servers were giving answers to my query rather than just the one that I intended. This will never happen when using TCP but when using UDP it may happen. Every now and then the packets would receive in a different order and then I'd be paged because some app fell over. Fun times.

inopinatus · on Aug 7, 2017

Is the Nginx client using the happy eyeballs algorithm?

https://en.m.wikipedia.org/wiki/Happy_Eyeballs

Can be a source of race conditions.

jsz0 · on Aug 8, 2017

I had a very similar problem recently with docker + nginx. Best I could figure out was the randomness of the problem was being caused by keep alive connection limits. If the connection was opened as IPv4 it would work until it hit the keep alive limit but the new connection might run into the IPv4/IPv6 lookup problem and fail. Never really figured it out for sure. It's definitely thrown some cold water on my plans to go dual stack everywhere all the time. Not sure it's worth the risk of running into these stupid bugs.

p1mrx · on Aug 7, 2017

Ideally, a server binding to "localhost" should create a listening socket for each of its IP addresses (e.g. 127.0.0.1 and ::1), and a client connecting to "localhost" should try each IP address in order, until one succeeds.

But a lot of standard libraries (including parts of Java and Go) get this wrong, and pick exactly one IP address arbitrarily. When you combine a buggy client with a buggy server, and their preferences for IPv4/IPv6 disagree, then all hell breaks loose.

These are the open bugs I'm aware of:

https://bugs.openjdk.java.net/browse/JDK-8170568

https://github.com/golang/go/issues/9334

fulafel · on Aug 8, 2017

Or use a single socket for both protocols (set the IPV6_V6ONLY socket option to false).

p1mrx · on Aug 8, 2017

That works when binding to :: (all interfaces), but it's irrelevant when binding to "localhost", because you can only bind one address per socket.

fulafel · on Aug 9, 2017

Yes, I thought it worked for ::1 too but you're right.

user5994461 · on Aug 8, 2017

An interface on linux has only one ip address.

The "localhost" interface will either designate the ipv4 127.0.0.1 interface or the ipv6 ::1 interface. That's the realm of undefined behavior and system specifics.

This whole IETF draft looks like a mess. They should reserve the names localhost4 and localhost6.

pilif · on Aug 8, 2017

> An interface on linux has only one ip address.

absolutely not true. Neither for IPv4 nor IPv6, where it's even the default to have a multitude of addresses on an interface.

> The "localhost" interface will either designate the ipv4 127.0.0.1 interface or the ipv6 ::1 interface. That's the realm of undefined behavior and system specifics.

no it won't. My loopback (lo) interface has both 127.0.0.1 and ::1 as its addresses.

__david__ · on Aug 8, 2017

> An interface on linux has only one ip address.

Eh?

    $ ifconfig lo
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)

justinclift · on Aug 8, 2017

> An interface on linux has only one ip address.

Just to point out, while the behaviour of any one OS is a useful data point for the discussion, it's not super useful to say "because OS 'foo' does things in a certain way, a well used programming language should limit it's design to that OS's way of doing things." :)

Hmmm, I guess I'm trying to say that (at least) Go should be fairly OS agnostic about this.

DominoTree · on Aug 8, 2017

Some genius at my company decided that ~180,000 Windows endpoints needed "localhost" removed from their hosts file, which has resulted in millions of requests per minute for localhost hitting our resolvers just to return 127.0.0.1.

My guess is that it was some hack they tried to disable IPv6, but aside from the insane load it added to the DNS infrastructure, the other result is that if these machines talk to a malicious resolver, their traffic destined for the loopback interface could end up going anywhere and being captured by anyone.

Great job!

JdeBP · on Aug 9, 2017

If the machines talk to a malicious resolving proxy DNS server, then more than traffic destined for loopback is at risk.

I suspect that removing the "localhost." record was nothing to do with IPv6 and everything to do with a corporate policy to not have anything other than the Microsoft default contents in hosts files, possibly because of concerns relating to malware prevention. The problem is possibly the result of the default hosts content changing in Windows NT 6.1.

* https://support.microsoft.com/en-gb/help/972034/

As of Windows NT 6.1, lookups of "localhost." are handled internally within (as I understand) the DNS Client, and never require inspecting a hosts file or sending a query to a DNS server. So the new default hosts file content no longer contains a "localhost." record. But use the Windows NT 6.1 or later default hosts file content on earlier versions of Windows NT, and one will see "localhost." queries being sent by the DNS Client to a server.

Handling "localhost." within the DNS Client is -- reportedly -- so that the DNS Client can inspect the local machine's protocol support and only return non-empty AAAA and A resource record sets if IPv6 or IPv4 is actually enabled on the machine.

rmc · on Aug 7, 2017

The weirdest was a co-worker who had some simple webserver, which was listening on only IPv4 or IPv6 (but not both). When he went to "localhost" on Firefox it used IPv4 and he was able to see it. On Chrome "localhost" was IPv6 (or the other way around), and he got "Could not connect" error. It confused him no end how this simple web server worked on FF but not Chrome. :)

okket · on Aug 8, 2017

> It turned to be related to the use of "localhost" in the configuration. It resolves to ipv6 on some systems and that breaks everything because the target app is only listening to the ipv4 address.

You just found a major bug in the application and should complain to the developer. Applications that do not support IPv6 are simply broken and should be avoided at all cost by now.

tootie · on Aug 7, 2017

I always have local.my company.com DNS that resolves to 127.0.0.1. I can get a valid cert that way too.

rblatz · on Aug 7, 2017

I hope you don't publish that record to the world.

piscisaureus · on Aug 7, 2017

It's already been done: https://git.daplie.com/Daplie/localhost.daplie.me-certificat...

Which I find to be a very practical solution for connecting to localhost over https, it frees you from having to install a self-signed certificates/CAs on your machine.

jaas · on Aug 8, 2017

Publishing private keys is a violation of the Let's Encrypt terms of service. We are revoking these certificates.

https://letsencrypt.org/documents/LE-SA-v1.1.1-August-1-2016...

0x0 · on Aug 7, 2017

Not a great idea to publish private keys for valid certificates. Anyone could probably submit a certificate revocation request to the CA, as the key would be considered compromised.

icebraining · on Aug 7, 2017

IshKebab · on Aug 7, 2017

I guess anyone on 127.0.0.1 can pretend to that address. Very unlikely to matter.

jwilk · on Aug 7, 2017

Same-site scripting:

http://seclists.org/bugtraq/2008/Jan/270

icebraining · on Aug 7, 2017

Interesting. Still, that requires the attacker to be already running a process on the victim's machine, even if with reduced privileges. Nowadays that's rare, since there's no reason not to give each user its own network namespace, at the very least.

brazzledazzle · on Aug 7, 2017

Just a guess: CORS-related attacks

icebraining · on Aug 7, 2017

How would that work?

Spivak · on Aug 8, 2017

Lots of sites seem to be doing it now. Off the top of my head I know that Box and Spotify both do it.

fundabulousrIII · on Aug 8, 2017

Got to chuckle about this. The new generation is fearless and naked. Break it all, admit nothing, make it 'better'.

Standards, practices, tradition, culture. They mean nothing when a devops lead has commit rights to the ansible playbook and a will to deliver a fix in 5 seconds flat.