So, how easy will it be to deliberately search for an IPv6 address to collide with a desired pseudo_ipv4 address? (Based on my very limited crypto knowledge, I might worry that there could be some novel denial-of-service or impersonation attacks in that direction if this is MD5 with a known format and salt.)
300MM is well inside brute force range for even a single CPU (as noted in the article, md5 is cheap!). One issue is figuring out which IPv4 address you wish to impersonate; sites don't give that information out so readily in 4chan's case, the IPv4 address only ends up in user-facing information as poster IDs, which are themselves hashes of the IPv4 address and (I assume) some thread-specific salt. For these poster IDs, I've never checked if a cookie is involved, but that could also be the case, and would make this attack a bit harder; that said, it is quite feasible to obtain the target's IPv6 address through other means.
I think what might be kind on Cloudflare's side is to add a secret domain-specific salt to this md5 hash, but I'm by no means a crypto person.
(edit) eastdakota and billpg below both pointed out that to carry out an impersonation would require connecting to Cloudflare with the correct IPv6 address. This is probably the biggest hurdle, so feel free to ignore what I wrote above.
Anyone with an IPv4 address can use one of several 6to4 gateways to get a whole /48. This gives them access to 2^80 addresses they can originate traffic from.
The hash only takes the top 64 bits of the IPv6 address, so unless you have a wide choice of that half of the IPv6 address, you could only use the one you've been given by your ISP.
Even that possible vulnerability (if I can even call it that) would be stopped if they (Cloudflare) included a secret salt in the hash so the only way to know which class E a particular IPv6 address has would be to try it out and observe the connection from the other side.
Exactly. Since it looks like there's no salting going on, if you know your target's IPv6 (and from that their calculated class E), you could quite easily go through your own set of available addresses and see if any result in the same class E address as your target.
It wouldn't be a good assumption that the code we posted to the blog is exactly the same as the code that is actually in production. If we included something like a salt, we obviously wouldn't reveal it.
Either way it's an unnecessary risk. People with access to a /smallnumber might still be able to exploit it. I find it quite ironic that they are all like "Common guys, use this new stuff, old stuff is bad!" -- then they go on presenting a solution that makes use of md5.