Assuming that this method for generating passwords gets popular enough, brute force tools will begin to create an optimized attack for these passwords.
As there are so little words available, if I were to write a brute-forcing tool, I would try combinations of four words in my wordlist once I failed with my one-word-dictionary attack before I start trying out all characters.
But all is not lost: Either use more words or vary the amount of spaces you put between words. This way the dumb optimization "try four words delimited by space" wouldn't work on your password and they would have to go over to plain old brute forcing at which point, I agree, the longer, the better.
Properly executed, this will protect you against brute force attacks. No need to do nonsense like adding more spaces.
Of course XKCD botched it and said an inadequate minimum length...
Notable quote from the article: "This level of unpredictability assumes that a potential attacker knows both that Diceware has been used to generate the passphrase, the particular word list used, and exactly how many words make up the passphrase."
He calculated the entropy assuming that knowledge was had, I think he botched it in saying that 44 bits were enough. Of course if he were to recalculate it without those assumptions, but make it clear that he were doing that, then it would be better.
Best to error on the side of suggesting more. If the reader is technical enough to to realize you did that, then they're technical enough to estimate a more suitable value themselves.
Here's how I use diceware:
1) throw single die to determine number of languages
2) throw dice to determine which languages (number of iterations depends on the result of #1)
3) throw dice to determine language
4) throw all dice to determine first word in language
5) repeat #3, #4 until you've reached the desired number of words
But the combinatorial explosion starts putting that out of reach fairly quickly.
Let's say you're restricting yourself to a dictionary of only 2000 common words.
A single word, up against a 1000-tries-per-second attack, would therefore fall within 2 seconds. Clearly bad. Adding a second word gives 2000^2 combinations to test against, which would require about an hour. Still not good. A third word, pushing that up to 2000^3, takes us to about three months, which is probably acceptable in many cases. The fourth word, at 2000^4 combinations, gets us to 500 years, which is well beyond what most people are ever going to need for a web-based password.
Now, if you want to bring this into the realm of passwords in other contexts, which can perhaps be brute-forced with several-billion-tries-per-second attacks, this approach still works: you just need to add an extra word or two. At 2,000,000,000 per second the four word combination might only take a couple of hours, but a fifth word takes that up to half a year, and a sixth out to the thousand years mark, whilst still falling easily within the standard 7±1 memory limit and being much much much easier to remember than m(yV7&NlxAIZNx3>&@p&8/kX.
As there are so little words available, if I were to write a brute-forcing tool, I would try combinations of four words in my wordlist once I failed with my one-word-dictionary attack before I start trying out all characters.
But all is not lost: Either use more words or vary the amount of spaces you put between words. This way the dumb optimization "try four words delimited by space" wouldn't work on your password and they would have to go over to plain old brute forcing at which point, I agree, the longer, the better.