How anonymous is an individual in a social network dataset? That was the main subject of the hackathon organized by POPNET and Statistics Netherlands (CBS) on May 3 at Leiden University. By looking at openly available data on the internet, students had to derive the social network of several volunteers. The results give insight into what network data can be derived from public data, and therefore to what extent people in networks are really anonymous.
The Netherlands as a network
For several years now, POPNET and Statistics Netherlands (CBS) have been conducting research into the Netherlands modeled as a network.. In the context of this collaboration, research is being done into properties of, for example, the Dutch family network, the colleague network, and the neighbor network. These networks are very interesting for researchers, yet CBS is prohibited by law from publishing data that can be traced back to individual units, for example, persons, companies, or institutes.
The findings from the hackathon will help understand what information can be retrieved via the public Internet and thus may help to better assess the knowledge of a potential `attacker’ trying to deanonymize the data. This information can be used to make better estimates of what data might be shared and to protect data more effectively. These can give valuable insights into the field of Statistical Disclosure Control (SDC) and anonymization methods used therein.
The 20 students who signed up for the hackathon were able to find links between people either manually or automatically. They were able to find connections from work relations, family members, and friends of the volunteers. The winning team found over 3,000 social network links of the volunteers and won a book about hacking.