Supercolliding a PHP array (inserting 65536 elements takes 30 seconds)

saurik · on Dec 29, 2011

I think the key thing to realize here is that when PHP says "array", they apparently mean "ordered map": if you are allowed to have 'foo' as a key, it is somewhat bothersome to call it an "array". Then, once you realize that it is actually a hashtable, it is fairly obvious that there will be a worst-case input set.

tiles · on Dec 29, 2011

PHP uses the parlance "associative array" most often, to be fair. The confusion is really that PHP has few types that cover a lot of features. PHP combines its numerically-indexed array type with its ordered maps, and if you start by using numbers as keys, there's no conversion once you introduce a String key that makes it associative. Array functions (pop/push) will continue to work, as will dictionary key sorting, etc.

I would like to know if PHP's object implementation, which was introduced after their associative arrays, suffers from the same hash collision issues. I've read CPython's hash algorithm (it's quite beautiful) and I wonder if PHP internally represents objects as glorified hash maps also.

sapphirecat · on Dec 29, 2011

> I would like to know if PHP's object implementation, which was introduced after their associative arrays, suffers from the same hash collision issues.

I was curious myself, so I went digging. Most of this stuff appears in `Zend/zend_object_handlers.c`. For objects with std_object_handlers set up on them (which I assume is most of them), zend_std_read_property can do quite a bit of work: it tries a search of the property info hash zobj->ce, then the zobj->properties hash, and then calling `__get()` with protection against __get loops, then giving up and returning null. zend_hash_quick_find(), as used for these lookups, takes a HashTable* as its first argument, which is also the type of a hash table in zvalue_value.

Thus, object properties are associative arrays, though I don't think there's an equivalent severity of attack compared to causing collisions with crafted GET/POST data, as the latter is automatically parsed. You'd have to be able to upload code to make evil objects.

Going through this exercise was fun, though:

    <?php $x=new stdClass();
    $x->{"b\0ar"} = 12;
    $x->{"b\0"} = 14;
    $x->{"\0bar"} = 18; // fatal error ?>

The "property starts '\0'" case is specially checked by the standard property handler, with no explanation...

nikic · on Dec 29, 2011

A leading null byte is PHP's way to mark protected and private properties. See this: http://codepad.viper-7.com/TXMSbw

subwindow · on Dec 29, 2011

Yes that is just one way in which PHP is annoying. I think they actually refer to it as an "associative array", which as far as I can tell is exactly a hash, but they never refer to it as one.

On a side note, Ruby does not appear to have that same problem as similar code executes in .02 seconds. I think it is probably because Ruby's hash function is superior.

mechanical_fish · on Dec 29, 2011

Having just done this research... Ruby 1.9 has apparently had randomized hash functions for years. Ruby 1.8.7 and JRuby were apparently vulnerable until yesterday, but both released fixed versions yesterday.

Meanwhile PHP will presumably do a similar set of fixes, though I haven't seen the announcement yet. There are workarounds for web apps that involve using extensions to limit the size of POST requests and/or the acceptable number of parameters in a POST request.

nikic · on Dec 29, 2011

Actually this is not specific to PHP at all. Ruby just uses a different hashing algorithm so you need to set different keys to get these results.

This is a problem that hashtables have in general (unless they are randomized).

mdwrigh2 · on Dec 29, 2011

Ruby 1.9's hash function has been randomized for awhile now, and I believe that 1.8.7 got a patch fixing this yesterday.

dangrossman · on Dec 29, 2011

Ruby's not immune to this... http://www.ocert.org/advisories/ocert-2011-003.html

alinajaf · on Dec 29, 2011

Well hold up a second... quote from that link:

_ In the case of the Ruby language, the 1.9.x branch is not affected by the predictable collision condition since this version includes a randomization of the hashing function._

So there is some merit to what the commenter is saying, though I doubt he knew the above.

Actual ruby arrays (which are arrays and not hashes) will obviously not exhibit this problem though.

I think for all practical purposes, unless you're doing something really weird, the likelyhood of hash function collisions is rare enough that we don't need to think too much about it.

damncabbage · on Dec 30, 2011

  I think for all practical purposes, unless you're doing 
  something really weird, the likelyhood of hash function 
  collisions is rare enough that we don't need to think too 
  much about it.

Except that, like with PHP, the worrying part is that someone can stuff rack.request.form_hash or rack.request.query_hash (a la PHP's $_POST and $_GET).

(Unlike PHP, though, the Ruby community can head off these particular attacks by releasing a new version of Rack, while waiting for a new 1.8.x release containing a security patch.)

subwindow · on Dec 29, 2011

Yes it is, and it has been for at least a year.

rufibarbatus · on Dec 29, 2011

> PHP already landed a change (which will ship with PHP 5.3.9) which will add a max_input_vars ini setting which defaults to 1000. This setting determines the maximum number of POST/GET variables that are accepted, so now only a maximum of 1000 collisions can be created.

Wait, where did we establish that less user input = less array insertions?

pavel_lishin · on Dec 29, 2011

Well, the point of that change was to prevent a hostile user from DoSing your server with malicious GET/POST requests.

I imagine it would actually be fairly difficult to accidentally recreate this issue, or let it slip through testing. No amount of patches in the world will protect you from idiots with access to your codebase

1880 · on Dec 29, 2011

Maybe they're referring to the $_POST array? You can't fix that in your app code.

subwindow · on Dec 29, 2011

I think the most obvious exploit path is using thousands of query parameters, which are inserted into an "array" in PHP.

An ini setting seems like a terrible and incomplete fix to the problem.

maratd · on Dec 29, 2011

> An ini setting seems like a terrible and incomplete fix to the problem.

Why? It solves the problem entirely.

subwindow · on Dec 29, 2011

It only solves the exploit path, not the vulnerability.

The true issue is that their hashing algorithm sucks. Any patch that doesn't fix the hashing algorithm is a band-aid and not a true fix.

nikic · on Dec 30, 2011

It is somewhat risky to fundamentally change the hashing algorithm late in the release cycle (RC4). It is bound to cause problems. The ini-Option prevents the obvious threat without doing deep changes to the core.

jamesmoss · on Dec 29, 2011

If you're running a version of PHP which is patched with Suhosin (http://www.hardened-php.net/suhosin/) you'll already be protected from the DoS vector outlined in the article. By default Suhosin limits the max number of $_GET/$_POST/$_COOKIE parameters to 200.

bluesnowmonkey · on Dec 29, 2011

> So if we insert a total of 64 elements, the first one 0, the second one 64, the third one 128, the fourth one 192, etc., all of those elements will have the same hash (namely 0) and all will be put into the same linked list.

This doesn't seem correct, or I'm missing something.

Upon the first insertion, PHP doesn't know that you intend to insert 63 more elements. It shouldn't allocate a 2^6-element underlying array until it exceeds 2^5 elements, right? So the first 2^5 insertions would be constant time, and only the next 2^5 would be linear.

I'm not sure how PHP performs the reallocation to increase an array's capacity. Maybe it allocates a blank array, and then inserts the existing elements using the standard insertion algorithm. In that case 2^6 linear-time insertions would occur -- half during the reallocation, and half afterwords. But it still bears mentioning that performance wouldn't tank until you inserted half+1 of the values.

jgeralnik · on Dec 29, 2011

It's not really an array, it's a hash table. When you insert an item it doesn't simply put it in a previously allocated array, but places it at the end of a linked list in the correct bucket determined by the hash. In order to check if the key has already been used the list must be traversed each time, and so insertion becomes a linear operation.

nikic · on Dec 29, 2011

Even if the HT hasn't reached the full size yet all elements will nonetheless collide: If the size currently is at 16 then 64 will still have a hash of 0, as will 128, etc. For 16 simply other values would additionally collide like 16, 32, 48, etc. These would stop being collisions as soon as the HT is resized, so we don't insert them.

kmm · on Dec 29, 2011

I could see lookup becoming slow but why is insertion affected? Isn't insertion into a linked list a O(1) operation?

jbert · on Dec 29, 2011

All keys in the list need to be examined, since PHP needs to work out if you're updating an existing key or inserting a new one.

nikic · on Dec 29, 2011

Even if the real C array contained a pointer to the last element in the LL (which it does not, it only points to the first) it would still be worst-case O(n): PHP has to check all elements in the LL to ensure that the element does not exist yet.

Consider this:

    $foo = array();
    $foo['bar'] = 'x';
    $foo['bar'] = 'y';

In this case the second set of the 'bar' index modifies an existing element in the linked list and is not appended as a new one.

kmm · on Dec 29, 2011

Of course! I hadn't considered that inserting an element can mean updating an already existing element.

In C you don't need a pointer to the last element though, you can just replace the pointer to the first element with the new element and put the old pointer in the "next" field of the new element. I usually implement stacks this way.

spoondan · on Dec 29, 2011

Even if the real C array contained a pointer to the last element in the LL (which it does not, it only points to the first)

What version of PHP are you looking at? PHP 5.3 and up seem to have a sensible linked list implementation in zend_hash.[ch]. The buckets only have a pointer to the list head, but items are inserted at the head. The hash has a separate list for ordered traversal of all items in the "array" and that has pointers to both the head (for traversal) and tail (for insertion). In both cases, list insertion is O(1).

Of course, as you said, the search for our worst case is necessarily O(N). Depending on your perspective, we can say that's a trade-off with hashes, a fault with the hash algorithm, or a fault with the collision resolution strategy.

peteretep · on Dec 29, 2011

Wait, what? Didn't we fix this in Perl forever (8 years) ago? Didn't everyone else fix it at the same time?

http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...

How is any modern programming language still vulnerable to this?!

alastairpat · on Dec 29, 2011

I just ran this (using the PHP CLI) on my Lion MacBook Air and it took far, far longer than 30 seconds - 172, to be exact.

Not being at all familiar with the underlying mechanisms, does anybody understand why it would take so much longer on {PHP 5.3.6, Lion, my MacBook Air, the PHP CLI}?

nikic · on Dec 29, 2011

Different hardware, different results. I ran my tests on a i3 CPU.

I don't know what a MacBook Air uses, but given that it's a notebook probably something slower :)

sp332 · on Dec 29, 2011

I know it's hard to believe, the the low-end MBA has a Core i5 processor. https://www.apple.com/macbookair/features.html Apple worked directly with Intel to get a smaller "package", and soldered it to the mainboard to make it fit.

alastairpat · on Dec 30, 2011

That's interesting, I've got a Core 2 Duo which I assumed would be similar to the i3 in terms of power. I guess it shows that processor versioning is more than just marketing.

watmough · on Dec 29, 2011

That's neat, I didn't realize that you could just type 'php file' to run things. Thanks for that.

My Core Duo MacBook gave the following results:

  Inserting 32768 evil elements took 19.348465919495 seconds
  Inserting 32768 good elements took 0.0174241065979 seconds

And running from the command-line for the full 2^16:

  Inserting 65536 evil elements took 123.00568199158 seconds
  Inserting 65536 good elements took 0.038620948791504 seconds

I'm running php 5.3.6, which is the current default version on 10.6.8 I believe.

tcas · on Dec 29, 2011

Running php -a will give you an interactive shell as well, however, if you do a lot of php development phpsh http://www.phpsh.org/ is a lot more feature rich and can be a huge timesaver.

watmough · on Dec 30, 2011

I think my development experience in php is up to maybe 15 - 20 lines or so now...

Thanks for the pointer!

damncabbage · on Dec 30, 2011

From the article:

  But there is hope! PHP already landed a change (which 
  will ship with PHP 5.3.9) ...

As far as I can tell, this isn't being backported.

This is going to be bad. I've never had the privilege of using PHP 5.3 during work hours; everything has always been stuck on 5.1.6 and 5.2.x.

Predictions:

a) People are going to jump to 5.3 in a hurry, or

b) RedHat will release a backport for RHEL (and Centos will release the patch in six months).

Either way, I think this will go unpatched in a large number of systems until DoS attacks become so common that a) and b) will need to happen.

nikic · on Dec 30, 2011

I sure hope that a) is going to happen. PHP 5.2 is EOLed (i.e. no security fixes) for something like a year now. PHP 5.1 even longer.

gog · on Dec 29, 2011

Haven't tested it yet, but I believe that if you use suhosin patch you can limit the number of variables with suhosin.get.max_vars and suhosin.post.max_vars so you do not have to wait for php to the release the version with the patch (I guess most of the people do not compile PHP from Subversion trunk for production usage).

asadkn · on Dec 29, 2011

Using string keys: https://github.com/koto/blog-kotowicz-net-examples/blob/mast... (base 5?)

Didn't check in details yet but it might be interesting for the curious kind.

zerothehero · on Dec 29, 2011

This is cool, can someone do the same for Python? (And other languages?)

Deestan · on Dec 29, 2011

Yes. This is a trivial exploit of a designed weakness in hashtables.

I shall, arrogant as I am, scoff at people who are surprised when this happens. I mean, honestly, what kind of developer doesn't know the basic properties of their data structures?

mdwrigh2 · on Dec 29, 2011

Well, this isn't exploitable if you use a randomized hash function like Ruby 1.9 does.

See http://www.ocert.org/advisories/ocert-2011-003.html for a listing of vulnerable languages (and yes, Python is on the list).

chc · on Dec 29, 2011

You make it sound like you're somehow disagreeing with him, but what he says is true even of Ruby's hash algorithm. Introducing randomness into the hash function is really just a band-aid on this vulnerability. The inherent vulnerability is there either way; you just need a bit of runtime information to do the attack when runtime information is introduced into the hash function.

zerothehero · on Dec 29, 2011

I meant actually give code for the exploit.

Cyndre · on Dec 29, 2011

Making a simple change of $size = pow(2, 16); to $size = 65535 removes the issue entirely.

Inserting 65535 evil elements took 0.030333042144775 seconds Inserting 65535 good elements took 0.020994901657104 seconds

lambda · on Dec 29, 2011

Yes, those are no longer values that will hit the worst-case performance here.

Did you read the article? The reason for the bad performance is that the way that PHP hashes integers is just taking the integer mod the size of the hash table. So for values which are 0 mod the size of the hash table, they will all go into the same bucket. If your hash table has a size that's a power of 2, then any sum of powers of two equal or larger than the size of the hash table will hash to 0, and go into the same bucket. So if you insert a whole bunch of values that increase by a large power of two, then they will all hash to 0 and give you O(n^2) performance.

Note that the size referred to there is being use not only as the size of the array, but also the amount to iterate by. By changing that to something other than a power of 2, you are no longer inserting "evil" elements.

The point of this article is that it's really, really easy to do a denial of service attack on PHP arrays by picking array indices that are large powers of two. Of course you can fix the problem by not using large powers of two.

Cyndre · on Dec 29, 2011

Very nice explanation. Thanks to you and a few others I now understand what is happening far better :). Interesting, even at 256 it takes 10 times as long to insert the evil elements.

Thanks again for helping me understand and learn something new today :)