(2010-11-17) The power of inference

The English Wikipedia article about Inference states that:

“Inference is a good guess heuristics (based on logic, statistics etc.) to observations or by interpolating the next logical step in an intuited pattern. The conclusion drawn is also called an inference.”

Did you get that, or do you just as Marty McFly in “Back to the future” want to say:
- “In English, Doc”?

Actually it is in English, even if it may not look that way. To cook it down to the most basic level, here are a few examples that require the use of inference to solve them:

(Replace X with what seems to be the most logical description or value)
Light is to darkness what X is to silence.
Picture of man walking – picture of hole in the ground – X – Picture of man in hospital with broken leg.

What may look like child’s play is actually an often overlooked aspect of IT-security: the dangers too much information. The examples above are really simple, but what about reality? In 2006, America Online released a huge list of searches that their users had done through their Internet connections. They thought it was a safe thing to do, since they had replaced the user’s identities with random numbers. 1]

Instead it became clear that many of the users easily could be identified and have their search and surf habits partially revealed. It’s not hard to guess that many people do “vanity searches” for their own name and thus reveal who they are.

I actually got a firsthand experience of something like this over the radio. I own a citizen band radio that I occasionally use to listen and chat with people nearby. Most of the time I just listen to the traffic. One time I wanted to know if any of the people/station names could be found on the internet, so I googled for the name of a particular station and found my way to a radio forum. What I didn’t think about was the fact that my search strings ended up in the log file of the web server I got to when I clicked on the search result. The guy operating the forum was himself an active radio-enthusiast. Sure enough, a few hours later I heard him comment on my search over the radio. He didn’t know it was me, and I had not searched for anything other than the handle of the station, but it shows how easily information gets away.

Now what does that have to do with inference? Simple: CB radios are license free, so we only get to use 4 watts of effect, which means that most stations can only reach 4-5 kilometers. This is also affected by antenna placement, but you get the point. The location of the station I googled for is known, so you can infer that I’m very likely to be within 5 kilometers of its location.

There’s a public map where most CB-broadcasters in Sweden have put their broadcast locations. Mine is on that map. If you look at the map, with this information, you can narrow it down to me and a maybe 2 others as the most likely identities of the Google search. Maybe that’s a bit of overkill, since most ip-addresses can be used to for geo-location, and I gave mine away. A bit chilly perhaps? Well, it’s not like this particular example is problematic. It was a totally acceptable search, but it was not as anonymous as I had thought it to be. I didn’t have any questionably intent or anything to hide. CB radios are unencrypted and open for everyone. But still, it makes you think, doesn’t it.

Facebook is a dream come true for those wanting to trace people, their friends and infer their habits. Most of the protections that Facebook offer their users are not that hard to bypass and the information that people put on their Facebook accounts are actually less interesting than who they have set as their friends. One day when I have a little bit more time, I’ll write something about Facebook and of the interesting situations people have gotten themselves into for using it.

To wrap it up: just remember that the pieces of information you’re withholding may be easy to find if enough “nearby” information can be found.


