You probably already heard of this NSA thing. They track everybody and everyone,
put evil stuff in your hardware, you ordered at amazon and have access to
thousand of devices all over the internet, because
there are backdoors included
(German).
They gave
RSA money to include backdoors
in crypto software. They track sms, nearly 200 million per day. They
infected about 100,000 PCs.
They
develop special computers
to break cryptography. This list is endless.
Got sensitized?
All this stuff got me aware of what I do with my data. My hard drive is
encrypted. My Emails communication can be encrypted via GPG, if the discussion
partner has it, too. I do not have propreritary software installed (Skype for my
LDR runs on another system as my working system). I do not have flash installed.
I want to leave all so called “social media” stuff as soon as I'm finished with
my exams (propably in four weeks from the release of this article). I tend to
install Tor for my daily browsing stuff. I double check if I'm connected over
HTTPS, even if this might not solve “the problem”.
Can we ask you something?
Well, two days or so, an E-Mail popped up in my university email account. The
university is a partner of several educational facilities which have a survey
each semester, on how the support for students in mathematical lessons helps
them to pass the maths-exam successfull.
They wanted my E-Mail address and my Name. But okay, this was optional. I didn't
fill these fields. I clicked “next”. Then, there was a form where I had to enter
some cryptic stuff, to ensure I can be refered to my old data when I participate
the next time on such an survey. The “cryptic” code was really simple:
* first two letters of my dads name (E.G.: “Peter” => “pe”)
* first two letters of my mums name (E.G.: “Veronica” => “ve”)
* first two letters of my name (E.G.: “Jean” => “je”)
* first two letters of my birthplace (E.G.: “London” => “lo”)
* both day and month of my birthday (E.G.: first of January => “0101”)
They wanted this data to be able to connect my answers to the answers of the
next time I participate on this survey.
But... I got stuck. I don't want to give them so much data. I closed this tab.
Later this day, I got in an argument with my girlfriend about all this. I
argued, that this is not anonymization anymore. They can refer from this data to
me. You don't belive this?
The calculation
Well, now let me show you:
- We had about 2,500,000 students in 12/13 in germany (German source)
- The webservive who offers this survey has [107 national
- partners](http://www.unipark.info/71-0-partner.htm)
- There are 145 universities, 11 church- and 16 private universities in germany (German source).
- There are 396 “Hochschulen” in germany (German source).
- We have about 80,500,000 cutizen in germany
We have about 2k hospitals here in germany (german source).
There are no official statistics on how many children where named “Julian”, at
least not in germany. So lets estimate! There are probably
40,000 firstnames on the world
(German source). We don't have them
all in germany. Lets take 30,000, this should be much enough. Lets also take
three different for mum, dad and son/daughter for our calculation.
Now, this is our data. (Edit: See below for better quick-and-dirty
calculation)
First, we calculate how many students are there with the same (first)name:
2,500,000 / 30,000 = 83,333
But not all universities participate in this survey:
107 / ((145 + 11 + 16 + 396) / 100) = 18.83
means: 18.83 % of all relevant educational facilities participate
Now, if only 18,83 % participate, we must use only 18,83 % of all students, this
also decreases the occurence of the same name:
83,333 * 18.83 % = 15,691.6039
means: 15k same names
Well, this would mean, each university has exactly the same number of students,
which is rubbish – but it's easier this way!
Now, not every student is really interested in such “Please do this survey” spam
like emails. Lets estimate, 15% of all students participate in such a survey:
15,691.6039 * 15 % = 2,353.741
means: 2.3k same names
But well, they do not have the same day as birthday:
2,353.741 / 365 = 6.447
means: 6.447 same names with the same birthday
But they are not born in the same place, I guess. I didn't find any data how
much cities we have with a delivery ward, but lets estimate 1,800.
6.447 / 1,800 = 0.0035816
means: 0.0035816 names with same birthday and birth place
At this point, it is clear that your name can probably be mapped to your birth
place. Or even better: There are 6 students in germany, which will participate
in this survey, which have the same birthday and the same name and study at a
educational facility which takes part in the survey system. And there are 0.003
students in germany which share one name, birthday and birth place and so on.
Ouch! 0.003 students... :–)
The survey wants just the two first letters of your name. Is this really
relevant? Yes it is. But I don't know how to calculate with it. I would propably
need the average name length in germany. Should be around five letters, I guess.
And I would need the distribution of names in germany as well. But if we
consider this in our calculation, it gets really complex!
My summary
What I want to show with this small calculation (hopefully it is correct... I
could bet I've done rubbish somewhere): If you participate in such a survey,
they tell you they will anonymize your data. But they can't! The probability
that there is someone who has the same
I wrote an E-Mail back to the sender of the survey, that I don't participate
because of the lack of anonymisation. I wonder what will return as answer.
If you find any bugs in my calculation, feel free to contact me: 20CA0F94
Update 1:
Okay, lets do it again, quick and dirty: We have 2,500,000 students, but only
18% can participate:
2,500,000 * 18% = 450,000
Only 15% will participate:
450,000 * 15 % = 67,500
with 30k different names:
67,500 / 30,000 = 2.25
means: There will be two people which participate with the same name.
The propability, that the second of these two has the same birthday as the
first, is easy:
1 / 365 = 0.274 %
Means: The propability that they share the same name and day as birthday is
less than one percent! Now ... think about their birth place, dads and mums
name...
I think I've showed it...
tags: #privacy