Bunhill Cemetery is just down the road from my flat in London. It’s a handsome old boneyard, a former plague pit (“Bone hill” -- as in, there are so many bones under there that the ground is actually kind of humped up into a hill). There are plenty of luminaries buried there -- John “Pilgrim’s Progress” Bunyan, William Blake, Daniel Defoe, and assorted Cromwells. But my favorite tomb is that of Thomas Bayes, the 18th-century statistician for whom Bayesian filtering is named.
Bayesian filtering is plenty useful. Here’s a simple example of how you might use a Bayesian filter. First, get a giant load of non-spam emails and feed them into a Bayesian program that counts how many times each word in their vocabulary appears, producing a statistical breakdown of the word-frequency in good emails.
Then, point the filter at a giant load of spam (if you’re having a hard time getting a hold of one, I have plenty to spare), and count the words in it. Now, for each new message that arrives in your inbox, have the filter count the relative word-frequencies and make a statistical prediction about whether the new message is spam or not (there are plenty of wrinkles in this formula, but this is the general idea).
The beauty of this approach is that you needn’t dream up “The Big Exhaustive List of Words and Phrases That Indicate a Message Is/Is Not Spam.” The filter naively calculates a statistical fingerprint for spam and not-spam, and checks the new messages against them.
This approach -- and similar ones -- are evolving into an immune system for the Internet, and like all immune systems, a little bit goes a long way, and too much makes you break out in hives.
ISPs are loading up their network centers with intrusion detection systems and tripwires that are supposed to stop attacks before they happen. For example, there’s the filter at the hotel I once stayed at in Jacksonville, Fla. Five minutes after I logged in, the network locked me out again. After an hour on the phone with tech support, it transpired that the network had noticed that the videogame I was playing systematically polled the other hosts on the network to check if they were running servers that I could join and play on. The network decided that this was a malicious port-scan and that it had better kick me off before I did anything naughty.
It only took five minutes for the software to lock me out, but it took well over an hour to find someone in tech support who understood what had happened and could reset the router so that I could get back online.
And right there is an example of the autoimmune disorder. Our network defenses are automated, instantaneous, and sweeping. But our fallback and oversight systems are slow, understaffed, and unresponsive. It takes a millionth of a second for the Transportation Security Administration’s body-cavity-search roulette wheel to decide that you’re a potential terrorist and stick you on a no-fly list, but getting un-Tuttle-Buttled is a nightmarish, months-long procedure that makes Orwell look like an optimist.
The tripwire that locks you out was fired-and-forgotten two years ago by an anonymous sysadmin with root access on the whole network. The outsourced help-desk schlub who unlocks your account can’t even spell "tripwire." The same goes for the algorithm that cut off your credit card because you got on an airplane to a different part of the world and then had the audacity to spend your money. (I’ve resigned myself to spending $50 on long-distance calls with Citibank every time I cross a border if I want to use my debit card while abroad.)
This problem exists in macro- and microcosm across the whole of our technologically mediated society. The “spamigation bots” run by the Business Software Alliance and the Music and Film Industry Association of America (MAFIAA) entertainment groups send out tens of thousands of automated copyright takedown notices to ISPs at a cost of pennies, with little or no human oversight. The people who get erroneously fingered as pirates (as a Recording Industry Association of America (RIAA) spokesperson charmingly puts it, “When you go fishing with a dragnet, sometimes you catch a dolphin.”) spend days or weeks convincing their ISPs that they had the right to post their videos, music, and text-files.
We need an immune system. There are plenty of bad guys out there, and technology gives them force-multipliers (like the hackers who run 250,000-PC botnets). Still, there’s a terrible asymmetry in a world where defensive takedowns are automatic, but correcting mistaken takedowns is done by hand.
— Cory Doctorow, Internet activist, blogger, founder of Boing Boing