These bitches, they track me!

You probably already heard of this NSA thing. They track everybody and everyone, put evil stuff in your hardware, you ordered at amazon and have access to thousand of devices all over the internet, because there are backdoors included (German).

They gave RSA money to include backdoors in crypto software. They track sms, nearly 200 million per day. They infected about 100,000 PCs. They develop special computers to break cryptography. This list is endless.

Got sensitized?

All this stuff got me aware of what I do with my data. My hard drive is encrypted. My Emails communication can be encrypted via GPG, if the discussion partner has it, too. I do not have propreritary software installed (Skype for my LDR runs on another system as my working system). I do not have flash installed. I want to leave all so called “social media” stuff as soon as I’m finished with my exams (propably in four weeks from the release of this article). I tend to install Tor for my daily browsing stuff. I double check if I’m connected over HTTPS, even if this might not solve “the problem”.

Can we ask you something?

Well, two days or so, an E-Mail popped up in my university email account. The university is a partner of several educational facilities which have a survey each semester, on how the support for students in mathematical lessons helps them to pass the maths-exam successfull. They wanted my E-Mail address and my Name. But okay, this was optional. I didn’t fill these fields. I clicked “next”. Then, there was a form where I had to enter some cryptic stuff, to ensure I can be refered to my old data when I participate the next time on such an survey. The “cryptic” code was really simple: * first two letters of my dads name (E.G.: “Peter” => “pe”) * first two letters of my mums name (E.G.: “Veronica” => “ve”) * first two letters of my name (E.G.: “Jean” => “je”) * first two letters of my birthplace (E.G.: “London” => “lo”) * both day and month of my birthday (E.G.: first of January => “0101”) They wanted this data to be able to connect my answers to the answers of the next time I participate on this survey. But… I got stuck. I don’t want to give them so much data. I closed this tab. Later this day, I got in an argument with my girlfriend about all this. I argued, that this is not anonymization anymore. They can refer from this data to me. You don’t belive this?

The calculation

Well, now let me show you:

But not all universities participate in this survey:

107 / ((145 + 11 + 16 + 396) / 100) = 18.83

means: 18.83 % of all relevant educational facilities participate

Now, if only 18,83 % participate, we must use only 18,83 % of all students, this also decreases the occurence of the same name:

83,333 * 18.83 % = 15,691.6039

means: 15k same names

Well, this would mean, each university has exactly the same number of students, which is rubbish - but it’s easier this way! Now, not every student is really interested in such “Please do this survey” spam like emails. Lets estimate, 15% of all students participate in such a survey:

15,691.6039 * 15 % = 2,353.741

means: 2.3k same names

But well, they do not have the same day as birthday:

2,353.741 / 365 = 6.447

means: 6.447 same names with the same birthday

But they are not born in the same place, I guess. I didn’t find any data how much cities we have with a delivery ward, but lets estimate 1,800.

6.447 / 1,800 = 0.0035816

means: 0.0035816 names with same birthday and birth place

At this point, it is clear that your name can probably be mapped to your birth place. Or even better: There are 6 students in germany, which will participate in this survey, which have the same birthday and the same name and study at a educational facility which takes part in the survey system. And there are 0.003 students in germany which share one name, birthday and birth place and so on. Ouch! 0.003 students… :-) The survey wants just the two first letters of your name. Is this really relevant? Yes it is. But I don’t know how to calculate with it. I would propably need the average name length in germany. Should be around five letters, I guess. And I would need the distribution of names in germany as well. But if we consider this in our calculation, it gets really complex!

My summary

What I want to show with this small calculation (hopefully it is correct… I could bet I’ve done rubbish somewhere): If you participate in such a survey, they tell you they will anonymize your data. But they can’t! The probability that there is someone who has the same I wrote an E-Mail back to the sender of the survey, that I don’t participate because of the lack of anonymisation. I wonder what will return as answer.

If you find any bugs in my calculation, feel free to contact me: 20CA0F94


Update 1:

Okay, lets do it again, quick and dirty: We have 2,500,000 students, but only 18% can participate:

2,500,000 * 18% = 450,000

Only 15% will participate:

450,000 * 15 % = 67,500

with 30k different names:

67,500 / 30,000 = 2.25

means: There will be two people which participate with the same name. The propability, that the second of these two has the same birthday as the first, is easy:

1 / 365 = 0.274 %

Means: The propability that they share the same name and day as birthday is less than one percent! Now … think about their birth place, dads and mums name…

I think I’ve showed it…