A new study set to be published in Big Data found that it doesn’t take that many Facebook likes to determine key traits about people.
For years advertisers have been using data about consumers’ online activities to target ads, drawing inferences about people without their knowledge or consent. Researchers at Columbia Business School in New York, Northeastern University in Boston, and New York University wanted to find out if “cloaking” – that is, using a feature of some kind to keep that information from being used by others – some of that online activity could prevent analysts from making inferences about you.
They used data from 164,883 U.S. Facebook users. The researchers knew some traits about those people (like if they’re gay, they smoke, or they’re Muslim) based on their profiles and their responses to a survey, and they knew those users’ likes.
They worked backwards to see how effective cloaking was. They built a model that used a person’s likes to find out how likely it was that they had a certain trait. Then they blocked individual likes from each user from being used in the model to see how many needed to be blocked to keep the model from making an inference about the user.
Being a gay man was one of the traits they examined. The researchers found that, if the average gay man chose right, he would only need to cloak eleven likes to keep their model from inferring that he’s gay. That’s only 7.4% of his total likes.
The average lesbian would need to remove fewer likes – only six, or 3.5% of her total likes.
This means that it wouldn’t take much work for a user to stop others from making inferences about them.
The paper is mostly about the statistical model the researchers were using, so it does not have lists of what those very gay and very lesbian likes were.
But since “gay” was used as an example to explain their work in part of the paper, a couple of the charts show that The Ellen Degeneres Show, Lady Gaga, Human Rights Campaign, Glee, True Blood, Katy Perry, Skittles, Barack Obama, Harry Potter, Britney Spears, Madonna, and Michelle Obama were among the gayest likes in their data. (And liking Barack is gayer than liking Michelle.)
The researchers propose a cloaking feature for Facebook as a tool to protect privacy. This way, people could still like whatever they want to like, but they wouldn’t get ads targeted to a demographic that a retailer’s computer thinks they’re a part of. This would allow a lesbian to like Ellen’s show but prevent an LGBT travel company from targeting her for ads if she didn’t want to be outed to someone looking over her shoulder.
A few months ago other researchers put out a paper showing that facial recognition software could be used to make predictions about someone’s sexuality by examining online photos. Some people feared that the technology could be used one day by an oppressive regime to out gay people.
The thing is, we are already leaving a huge amount of information about ourselves online that can be analyzed with software that’s not sophisticated at all. It wouldn’t be too hard for a motivated oppressive regime to put together a list of gay people right now.