Associative Trails Around DigCiz, Fake News, and Microtargeting

Microtargeting: A Digital Citizen’s Perspective

I started writing this post about fake news and microtargeting a few days ago and then I was reminded that #OpenLearning17 was talking about Vannevar Bush’s As We May Think this week. I began to see connections between how they might relate. It made this post even longer but I think it was worth it.

Some background if you don’t know: Bush’s article was written in 1945 as the war was ending. He was the Director of Scientific Research and Development during this time so he was all about applying science in warfare. In the article he is envisioning where scientists will put their energies as the war is ending.

Now, as peace approaches, one asks where they [scientists] will find objectives worthy of their best.

The article focuses on the connections we make when we build knowledge. How we associate past discoveries with current ones and tie things together. Bush advocates using technology to track the connections that we make in this process to extend memory for better reflection on those connections. Many credit this article with predicting the Internet.

He uses this term “associative trails” to describe indexing knowledge based on connections that we define. He thinks this is more powerful than typical kinds of indexing like sorting by number or alphabetizing. But I note that this is a much more personalized kind of indexing.

He is advocating for metacognition, that is, realizing what you are thinking and where your trails lie so you can better understand what you are researching, yes, but more importantly your own thought processes. What I am wondering about is what happens when you get the technology part but you leave out the metacognitive part? Bush does not seem to consider this option but I think this is often the world that we live in today.

When I start thinking about fake news and microtargeting I have to ask what if a person does not have access to their associative trails? What if they don’t even realize they are leaving a trail? What if they think that their trail is not so important? What if someone’s trail could be bought and sold? What does the record of all our connections say about us and can it be used in ways that might be exploitive?

I’m not a data scientist. I’m not a journalist. I’m not a librarian.

I am a technologist. I am an educator. I am a person. A person who lives some of her life on the web. I want to say a lot of her life on the web…. But “a lot” is a relative term.

Often it is journalists and librarians that tackle the fake news topic. I think that both of these groups add an important perspective to the conversation but I also think that there is the perspective of a digital citizen and those that advocate for such concepts; the perspective of someone using the web as a place of expression, a place to learn, and to be heard and to listen to others.

What is microtargeting?

When I bring the idea of microtargeting up I’ll start with something like “well you know they track a lot of your data from the internet to try to influence you” and most often, before I can continue, I hear “oh yes of course I know that”. Then there is the inevitable story of shopping for an item on one site and then continuing to see ads for it on other sites. But that is rather mild and not really what concerns me.

I’m not just talking about the machine realizing that you were looking at a product on another site or that you clicked on something from your email, that is cookies and web beacons, that is rudimentary stuff.

I’m talking about gathering thousands of data points, combining them, and analyzing them. Everything from shopping history to facebook likes and what church you attend can be gathered and combined with traditional demographics to create a “personalized experience” meant to influence you with emotional and psychological messaging.

The big story around microtargeting right now has to do with a little company called Cambridge Analytica (CA) in London. They are the big story because they’ve had well known wins with customers like the Brexit Leave and Donald Trump campaigns.  

In this eleven minute video during the Concordia Summit their CEO Alexander Nix explains how they work. In the video Nix explains that demographic and geographic information is child’s play. That the idea of all people from one demographic getting the same message: “all women because of their gender, all African Americans because of their race, all old people because of their age” is ridiculous. That those things are of course important but they are only part of the picture; that psychographics are a much more complete picture because then you are targeting for personality.

The big shocker where people feel a little creeped out is when they learn that CA uses those silly little facebook quizzes (you know the ones that you click the “connect to facebook” button on before you are allowed to take them) to profile your personality. What! Those quizzes are not just there for free for you to have fun with… as they say: if the service is free consider that you might be the product.

As we may forget

CA is not the only one doing this; they are just the popular story right now and the quizzing is only part of things. For me the big part is that connection to facebook which can give the owner of the quiz (be it CA or some other company) access to all of your account information, your likes, your posts, and often much of your friend’s information. Of course, much of your personal and consumer data can be purchased so throw that into the mix. Imagine aligning all of this data for a person. It is a lot. Often people don’t even realize what they are giving away.

You authorize the connection so that you can take the quiz or play the game or whatever and then it is over for you – you have had your fun and you move on. But the app still has that connection to your account and will continue to unless you go in and specifically delete it. This means that it can continue to gather data. Apps will vary of course and I can’t speak for any specific one but I know that all of you are reading the terms of service of each app before you connect it – right?

In this case the user is continuing to make associative trails on facebook through friending and liking. However, they are not using those trails for metacognition. They are not using technology to extend their memory so that they may better reflect on the connections that they are making. Instead they plow forward forgetting many of the connections and the fact that they have authorized someone/thing else access and track their connection trails. The trails are being harvested by an outside entity and the user, more than likely, has no idea who that entity is – did I mention that they could change the terms of service, the name of, or the nature of the app at any moment?

But how much can someone really do with all that data?

I have seen the data scientist folks that I follow sort of look at the CA story a little sideways and it seems every day there is a new article downplaying the impact CA had on the Trump and Brexit campaigns. Interestingly though not too many saying that the idea behind this, using big data and psychographics to personalize experiences, is invalid. Just that CA might be more hype than pay off.

This much more comprehensive story about the origins of CA in Motherboard states that Cambridge is not releasing any empirical evidence on how much or how little they are affecting the outcomes of campaigns. And though CA is more than happy to tout their wins as proof of their effectiveness I’ve yet to see anything about their losses which is a classic vendor ploy.

In this recent Bloomberg article, The Math Babe, Cathy O’Neil points out that what Trump was doing during the campaign is not uncommon and that the Hillary campaign was also doing it. Also, that U.S. companies have for decades been tracking personality. O’Neil points out that “To be sure, there’s plenty to be appalled at in this story…. It is just not specific to Trump”.  She states that Hillary had access to more data than Trump because she had access to Obama’s archive of data from the previous elections. 

But then I think about Bush. As We May Think considers information storage and to be sure the amount of data is important. However, I think the real meat is in the connections. It is here that I have a hunch that having the right context or being able to see the right connections could be more powerful than having more data – well at least if we are talking about the difference between a lot of data and a whole heck of a lot data. Did I mention that I am not a data scientist?

Paul Olivier Dehaye has written about how CA was targeting “low information voters” for the Trump campaign. This article hypothesizes that CA used data (citing CA’s claim to have 5000 data points for every adult American) to specifically look for voters who had a low “need for cognition” for microtarged political advertising. These are the type of folks who would be more likely to not dig too deep or question stories that were presented to them. These folks are not doing a lot of metacognition. I don’t blame them for this, but I’ll get to that in a bit.

What is real and how can we tell?

As I remember it, when the term fake news first started being thrown around during the campaign it was largely being used to define sites that were not run by major news organizations or even particular journalists but rather individuals who knew how to buy a domain name, hosting, and throw up a WordPress site but who were only interested in click revenu. They would come up with crazy stories and even crazier headlines just to get people to click. As these started to be called out as “fake news” some began to create lists of these sites and place parody and satire sites alongside of them.

But then it got more challenging with accusations that major news sources were in fact fake news and that we could tap into “alternative facts” to get to the truth.

Journalists receive training to be sensitive to bias and context and to not let it interfere with their reporting, so they should be more prepared to consider context and fight against bias, especially their own. However, you will never be able to completely remove bias and context; much of it can be hidden and not realized till later. It is here that education is asked to step in and create critical citizens who will hold journalists responsible for what they report and it is here that we see the calls for greater digital and information literacy in regards to fake news.

Fake news, microtargeting, and digital citizenship

Bush envisioned people using technology to extend their memory to be more metacognitive about the connections they were making while they were building knowledge. These seem like rather “high cognition” kind of folks to me but what about those “low cognition” kind of people that Dehaye thinks CA could be after? Who are they?

I mean I’ll admit that I’m guilty myself. I don’t read every terms of service for every new app I download. I have forgotten that I’d given access to some app only later to find it hanging out in my facebook or accessing the geolocation of my phone. But I think that it is really some of the most vulnerable among us that are at risk here.  

What if you work 40/50 hours a week and care for children, parents, or grandparents? What if you have a disability or illness to manage? What if you grew up surrounded by technology and this kind of technology usage is your normal? Do you have time to build all of those literacies? 

Building critical literacies around information and digital technologies takes time. It requires more than just a list of which websites are fake, which are satire, and which are backed by trained journalists. It requires more than a diagram of which news sources lean in which direction politically.

You need the ability to critically look for the nuance of things that could be off. For instance a .com.co is different than a .com. Kin Lane talks about “domain literacy” and goes much deeper than this basic understanding of domains but I hope you see what I mean. We need to read the article and then ask is it really reporting first hand or are they reporting on reporting as Mike Caulfield points out when he calls for the first step in fact checking to not be evaluating the source but rather determining who the source is!

Once you determine the true source you need to evaluate it – who wrote this, what are their political leanings, are they being backed by other influences (like money) somewhere? You should click on the article’s links and/or look at its sources and read those articles to get context before you make a definitive decision about it’s worth. All of this takes access, and knowledge, and constant practice.

Maha Bali writes about how fake news is not our real problem. She points out how fake news is good for critical thinking and states that we need more than just a cognitive approach; what we really need is cross-cultural dialog, learning, and skills. This is where education and community need to step up to the plate.

It seems like a lot and for me it is a call for better general and liberal education. I think the first step may just be in realizing (and getting students to realize) that my internet is different from your internet. Where possible, taking ownership for our own “associative trails” and demanding that ownership when it is kept from us. Finally, simply realizing that there are political forces and companies with lots of your data… which has always been the case but maybe realizing that they are trying to influence you in increasingly intimate ways.

This article (images and words) are CC-BY Autumm Caines

When Free Beer Leaves Me Cold: Declaring Interest in #OpenLearning17

I’m super excited that some of my favorite Virginia educators have gotten together to do a cMOOC! #OpenLearning17 started today and I’m so thrilled to follow along and learn with a great community. The syllabus says this week is for introductions, blogs, and working with a connected learning coach.  There is also a great reading all about the meaning of “open” which was enlightening to the history of the word. To this end the article starts with the word “free” as defined by Richard Stallman for the Free Software Definition, distinguishing the difference between “free” as in “free speech” not as in “free beer”. “Free”, in the sense that will eventually grow into “Open”, is focused more on liberation than lack of price. So, besides lacking a price, “free” as defined by Stallman also includes the ability to see and change the program itself – also the ability to redistribute changed versions of the program. It seems to me that in this way the program is used by the person instead of the person being used by the program. It also seems that this encourages community as conversations need to arise around this kind of usage.

I am actually struggling with some “free beer” kind of software (at least I think it is free beer) in my life right now so I thought I’d talk about it as my introduction to the group.

I’ll be the first to admit I’m not the best at email management. Our institution has a size limit on faculty and staff inboxes that is like 20mb or something. I’m always archiving stuff off because I’m getting yelled at for not having enough space in my mailbox. To make more room one of the first things that I do is sort by size and archive off the messages that are the largest. Usually these are messages that have large attachments.

A few months ago it seemed to really start filling up quick. The thing was I would do my little trick of sorting by size and I started finding these messages from one particular professor that I was working with that had no attachment – often they were just a sentence or two of text. I checked to see if perhaps there was an image in the signature line that was taking up a bunch of room but I didn’t see anything like that. I thought it was a fluke, archived the messages, and moved on.

The thing was it kept happening and it was getting worse. The first few times I found these messages they were maybe 500kb but then after a week I was finding that they were 1mb – then 2mb – and always from the same professor. What was going on?

I’d had enough and I knew there was something that I couldn’t see in the background of those emails. I asked our Instructional Designer Jim Kerr what he thought and we started a back and forth of trying to deduct what was going on. Was it only in replies? Was it every email that professor sent me? It did not seem to be happening when the professor sent from his phone… Well eventually we pulled up the source code for the emails and the cause became abundantly clear – there was about 11,000 lines of junk code in each of those emails. I don’t read or write code but one word was sticking out all over the place; Grammarly.

Grammarly is a piece of “free” software that is supposed to help you write better. In real time it corrects spelling and grammar errors in all of your text. You can install it as browser plug-in so that you don’t even have to go to a website – wherever you write text on the web it is there.

Grammarly says it is the “free grammar checker” but I believe this is free as in beer not free as in speech. I’m new to the open/free movement and new to Grammarly so let me know if I got this wrong. I don’t see anywhere that I can get to their code to tweak it or to see what exactly it is doing or why it is ending up in the background of very simple emails and bloating them up. Any talk of community on their site applies to those looking to talk about grammar issues not to talk about the software, how it functions, or how users can change it directly. Grammarly is free but there is a paid tier and the volume licensing also has a cost associated to it. So, I suppose it is like free cheap beer – if you want the stuff that tastes good you have to pay.

I printed the code that was behind the email just so that I could demonstrate how much was actually going on behind the scenes. Mind you this is double sided.

Printed code behind an email message that was one sentence long. This is double sided.

After going through all of this the professor immediately removed Grammarly from his computer – he said his email box was filling every day and no one could figure out why. But it also got Jim and I thinking about how Grammarly works. It is not entirely on your computer – much of the computing process is in the cloud – it needs the internet to function. So, it seems that it is basically a keylogger. Though it is not covert (I mean you install the thing) it is recording every keystroke and sending it to their servers to check for grammar and spelling issues. It does seem that they are encrypting and such but now we are wondering if there are implications for FERPA in an educational setting. And besides having some program record and send my every keystroke is a little creepy to me. Especially, If I don’t know what is going on in the background.

To be honest, I’m not a coder and even if Grammarly did make their source code available I couldn’t make much of it. I think that there might even be security issues if it were that open. Honestly, a big reason why I’m writing this as a part of my introduction for #OpenLearning17 is because I’m trying to better understand the implications from others that might know better than me. I’m wondering if our concerns about FERPA are warranted and if anyone has any clue how the junk code got into the emails. Any feedback would be great but if not, if this is too far outside the interest of #OpenLearning17, that is okay too. Hoping that this post can still act as a way of saying hi and giving folks some idea of the type of things that I’m thinking about.

Looking forward to working with everyone in #OpenLearning17 and can’t wait to see where this takes us.