Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The mistake you are making is that you keep on wanting to treat a passphrase made up of words as being a 'text' in the sense that Shannon was using, but it is not a text in the Shannon sense of the word.

Shannon was analysing real world messages to arrive at that figure, not a string of random words. Here's a way of thinking it through that should help you see the problem clearly.

Let's imagine that Alice has just used a dictionary to generate a passphrase. Furthermore, let's imagine that the dictionary in question is a collection of 6 character or longer words pulled from "Pride and Prejudice". I'm pretty sure Jane Austen tops the 5000 words needed for such a dictionary.

Now, in the simple example, let's imagine that the passphrase is one word long. Bob is an attacker that knows the dictionary, He will guess the word in a maximum of 5000 tries. After 2500 he will have a 50% chance of having found the word.

Charlie, a second attacker, isn't going to use words as symbols, he's going to try and brute-force the word just by throwing random characters at the problem. He doesn't know the length of the word, so he's going to have to try all lengths of the word starting from one letter and working up. I trust that you can see that Charlie is going to need a lot more than 5000 guesses to find the passphrase.

David is a bit smarter than Charlie. He decides to use a Markov Chain of 3 character length to generate his guesses, so that the generated passphrases start resembling English words. The Markov chain was trained on text from the Sydney Morning Herald. David is going to do a lot better than Charlie, but that Markov Chain has to be capable of generating all of the words in my original dictionary, plus a bunch of other words, plus a bunch of gibberish non-words. He's clearly going to take more attempts than 5000 to find Alice's passphrase.

Evan is smarter still. He trains his Markov Chain using only words that are 6 characters or longer, and furthermore, he increases the chain length to 6 characters. He also teaches his Markov Chain that word seperators exist, and the Markov Chain generator is reset when a new word is started. Now we're talking - Evan's system will produce very few strings that are not actual words, but it will generate a bunch of words that were in the SMH, but not in "Pride And Prejudice", so he's going to still need more than 5000 guesses to be sure that he's found Alice's passphrase, so he still hasn't done better than treating the words as symbols.

There is one Markov Chain attack that gets nearly equal performance. Take Alice's dictionary, use it to train a Markhov Chain generator that knows about word separators, and that knows that the words don't have any probability links between them. But now your Markov Chain generator has just become a fancy way of picking words out of the dictionary. In other words, it has degenerated to a dictionary attack, not a brute force attack, and a dictionary attack is just another way of saying "treat the words as symbols" which means we're back to my original entropy calculations as being the optimal way of determining the entropy of the passphrase.

As I said at the start, your error is that you're trying to use randomly picked dictionary words as an English text in the Shannon sense of the word. Hopefully the example scenario that I just ran through will help you see why the distinction is important.



I understand randomly selected words are not the same as the English text Shannon is talking about. However my point is that entropy may be lower than what it appears to be. I'm not saying it is 0.6 bits per character (or any other number). I'm saying that unless you words are long enough it is very likely that the entropy is lower than simply taking log(dictionary_size^word_per_passphrase)/log(2) The references to Shannon and Applied Cryptography were only made as supporting evidence that entropy of English text is lower than what it appears. I'm not claiming their exact numbers apply in this situation.

If we can't agree on that, the we must simply agree to disagree.


* I'm not claiming their exact numbers apply in this situation.*

You see, the thing is, you kind of are making that claim. If entropy goes out to 3 bits per character, the one feeble point that you did have gets blown out of the water. The fact that you don't seem to understand that indicates that you really need to go back and reread a few books on Information Theory.

Look, for your own edification, try and come up with a scheme that will reliably beat a dictionary attack in terms of the number of attempts needed before finding a password taken from the dictionary. You could even write a simple program to test the idea. Take a dictionary of 5000 words with a length of 6 characters or longer (much shorter words than what your theory suggests should be secure). Select 100 words from the dictionary at random, and then run any attack of your devising against those words. If you can reliably get a better average number of attempts than 2500, I'll concede the point.

Until then, I'm here to tell you that you don't understand information theory as well as you seem to think you do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: