TED-你的“点赞”暴露的信息远比你认为的要多

2018-01-16 小芳老师

在第二篇演讲中，演讲者Jennifer Golbeck——马里兰大学人机交互实验室主任，向我们展现了互联网各大平台上的数据中暗藏的奥秘。

https://v.qq.com/txp/iframe/player.html?vid=q05211k5deq&width=500&height=375&auto=0

首先演讲者从互联网的发展成果切入，提出了互联网已经成为一个有更多互动的地方，人们在这里互相交流、互相评论、互相分享，而不只是阅读信息观点。接下来以Facebook为例，提出了社交平台上存在的信息泛滥问题。

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral preference, demographic（人口统计学的; 人口统计的） data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

为了说明这个问题，演讲者向观众展示了这张图片并进行了解释。

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, let us find out all kinds of things.

演讲者谈到了一个很有趣的名词——“怀孕分数”，这只是以大量人口为背景，通过行为模式得出的分析之一，行为模式的运用要广泛的多，它也是我们在互联网平台上点赞或发表评论同样能被统计分析的原理。

So in my lab and with colleagues, we've developed mechanisms（机制） where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily（同质性）, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good model 45 35212 45 16086 0 0 10752 0 0:00:03 0:00:01 0:00:02 10752s of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis（假设，假说; [逻] 前提）, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

基于社会学的同质性理论， “喜欢炸扭薯条的这个页面 ”就成了高智商的象征，这不是因为内容本身，而是“喜欢”这一个实际行动反映了那些也付诸同样行动的人的相同特征。

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S., so users control their data.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

令人害怕的事实是我们的信息可以被分析并运用到各个方面，但由于用户其实没有太多的能力去控制这些数据的使用，演讲者把这个看作是将来的真实问题，并提出了几个途径来保护自己的数据，防止它们被非法滥用。

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

演讲的最后，Jennifer Golbeck提出了自己的希冀----希望用户拥有拒绝信息被使用的权利。当然，这也是所有互联网用户所希望的，互联网提供了一个相对自由的平台，也存在着很多新的威胁，更需要我们每个人去维护和谐与安全，同时我们也需要法律、制度、政策以及科学的保护。

文案：虞伟丽

审校：黄亦晨杨畅

排版：李曼欣

图片来源于百度

英文文稿摘自TED

学习型公众号：值得你关注
点击关键词获取福利
动画 | 10部美剧 | 专八全科 | 新概念 | 老友记 | 动画2
89奥斯卡 | 专四 | 英语智力竞赛 | 美语发音| BEC | PPT模板
关注小芳老师
听哈利波特，看TED视频
转发和点赞是我持续更新的动力
资源来自网络，如果有侵权，即刻删除！

从来就不缺傻子！

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

这一刻，快乐被具象化了

女高管与男下属上班约会开房，男方妻子闹到单位！被开除后她辩称：一直保持0.46-1.22米“个人距离”

周一004 意甲帕尔马VS卡利亚里【全网最强分析】今日继续拿捏主任！跟上吃肉！昨日推荐早场全收，今日8000倍直接做胆！