Sunday, December 5, 2010

Spamdomness

Randomness is an interesting thing in computer science, because a computer's numbers are only pseudo-random. Computers are quite deterministic, which makes it quite difficult to produce anything random. And yet, many mathematical techniques depend on reasonably random numbers. What's a programmer to do?
Most existing pseudo-random numbers are based on things that are difficult to detect or determine ahead of time, like a statistical study of keystroke timings, mouse movements, or the thermal entropy in the various capacitors. An "entropy pool" is maintained from the study of numbers like these that accumulates enough to be sufficiently random to, say, produce an encryption key that can't readily be guessed.
When sites want a stronger randomness than that, they often resort to the study of random events outside the computer to maintain an even bigger, and more rigorous random, than can be done inside the computer itself. One common technique is to point a webcam at a television that is tuned to a non-broadcasting channel. The resulting static is effectively random data, which is fed to the entropy pool. One site, Games By Email, has an elaborate machine to throw dice and record their rolls for later use. The site also promises to melt down any die that produces a roll that a customer is not satisfied with, because the customer base tends to strongly anthropomorphize dice and sometimes want to "punish" dice that don't roll the way they want. (And why not, dice are cheap and a satisfied customer is a returning customer.)
This all gave me an idea for a cheap source of pseudo-random numbers: Spam. Spam are unwanted emails that constantly hammer servers and annoy the crap out of millions of people, all because .00001% of people take them seriously and buy stuff because of them, thus handsomely profiting the group that put them out. Most spam is today thrown away, often by automatic means before any human being ever sees it. Or if not, is tossed into a special spam box to be discarded later. Just in case a real message is falsely flagged as spam. Spammers go to elaborate lengths to ensure that the recipient looks at the message.
So, from now on, when a message is flagged as spam, we take a quick hash of it, like an MD5 sum, and then manipulate this into the entropy pool by a pseudo-random means. (Arbitrarily pick one of: Add, subtract, XOR, OR, AND, Replace, Append). This should slightly improve the quality of the entropy pool with every spam you receive. After the summing, the spam can be discarded or added to the spam box or whatever the mail receiving program was going to do with it.
I don't recommend this technique to sites in need of high quality randomness, as it leaves a gaping security hole: An attacker can spam the site with several trillion copies of the same message, thus setting the entropy pool to a known quantity, thus effectively giving the attacker control of the encryption keys. But sites like that probably have a TV static, nuclear decay, or other basically impossible to control source of random numbers in the first place. They also have a lot of money to ensuring the security of their randomness.