Big Data. What Happens When Politics Becomes A Matter of Social Engineering.

David Parry bio photo By David Parry

“One thing should be clear, even though we live in a world in which we share personal information more freely than in the past, we must reject the conclusion that privacy is an outmoded value. It has been at the heart of our democracy from its inception, and we need it now more than ever.” *- President Obama, on *Consumer Data Privacy in a Networked World.

Over the last two years, public angst over the ability of companies to collect data about people and subsequently use this data to directly target consumers has certainly increased. One of the places this increased anxiety has clearly manifested itself is the discussion of “Do Not Track” legislation, a policy now backed by the Obama administration which would give consumers more control over how their data is used and force companies to be at least marginally more transparent than they are now.

At the center of this controversy is Behavioral Targeting. Think of Behavioral Targeting as the intersection between Big Data, Moneyball, Network Theory, Cognitive Psychology and Businesses. Depending on where you stand this alliance is either the Holy Grail of marketing or the ultimate in Unholy Alliances of consumer manipulation. The most disturbing mainstream article on this trend, published in the NYTimes, outlines how Target gathers data on consumers to develop a pregnancy prediction score, to know when a customer is pregnant so they can use that moment to change their buying habits. The article was sort of a wake-up call to the general public, a glimpse into how marketers are using all this data to effectively manipulate consumers and maximize profit margins. Fair enough, that’s what they do, and we can have the debate later about whether or not this type of behavioral targeting is a good idea, or to what extent we should regulate it. Instead I want to ask a more interesting, and to me more important question: what happens when you replace businesses with political actors in the above equation. That is . . .

What do you get when you cross Big Data, Moneyball, Network Theory, Cognitive Psychology and Democracy?

The answer to me is pretty clearly something not very good for the public. Indeed while I am generally fairly optimistic about the effect of the digital network on public formation, I think this is one area we need to be concerned about. It seems to be mixing this type of behavioral targeting with democracy seriously undermines the democratic process, from multiple angles.

What’s Going On Now.

It’s actually pretty difficult to know the type of big data plus behavioral targeting campaigns are engaging in. Not surprisingly campaigns want to keep this a secret, not merely for competitive advantage, but as with the Target story, revealing the degree to which we are tracked and marketed to could disturb some people, harming the cause. What we do know though is big data has become big business in politics. But a recent panel at Personal Democracy Forum gave a glimpse into what is going on. The panel was composed of one academic (Dan Kreiss) who gave a bit of historical perspective) followed by three individuals who worked for companies that market themselves as being able to help campaigns target voters thru online efforts.

-Campaign Grid, one of the presenters, said they have been able to successfully leverage cookies to match online viewing habits with voter records, with an 80% success rate. In other words for 80% of the voters they are able to identify online viewing habits. (Stop and think about that for a moment. For a bit on how this works see this article about RapLeaf.) Campaign Grid also suggested that they were working towards 100%. Imagine all the information Google and Facebook and other online trackers have on you crossed with all the public records available on you.

-Part of the idea here is to connect commercial data with voter data. Campaign Grid says that they want to crosstab voter data against 15,000 other commercial data points.

-The Catalyst Database (one of the ones used by the DNC) already matches 450 points of data on 250 million people.

-This is done by leveraging commercial data, think here all the online trackers, plus all that info generated by monitoring in store purchases, plus credit reports, and then merging this with public data, voter registration, tax databases, DMV records etc. Building a staggering amount of data both on certain population segments, but also on individuals.

-Targeted Victory has already received 2.5 million from the Romney Campaign. The name alone points to one of the problems here, the idea that the goal is victory, using Big Data to win elections not create a better public discussion. One of the presenters at PDF said, “The goal of Big Data should be about solving problems to “win elections.”

-One of the reasons that Obama won the last election was a significant advantage in both the primary and the general election in data, and effective use of that data. The 2008 elections were the first time that this type of data maximization generated from internet traffic was used. In 2008 they had 10 times as much data on any one voter as they did in 2004.

-Already in 2008 the Obama campaign was tracking data on donations, looking to understand how everything from shape, color, and message effected what types of emails led to donations. This is basically A/B testing on a sophisticated level meant to maximize campaign donations.

-Also in 2008 the Obama campaign was targeting messages based on your browsing history. If you visited websites about parenting you got an Obama ad (not necessarily on the parenting website but on other websites you visited) on education or child healthcare, whereas if you frequented environmental sites, the ads you would see would reflect Obama’s green policy. In other words when you visited CNN rather than serve you up a generic Obama ad they could tailor one directly to you based on your browsing habits at other sites. (For a primer on how this works watch this video.)

The goal of this type of massive data tracking is not to serve the electorate or democracy, but rather merely to identify the most efficient way to generate votes. In other words this is all used to persuade voters, not alter policy or engage the electorate. Don’t believe me just read their own rhetoric.

It’s Only Going to Get Worse

In some respects you could argue, although I think this would be wrong, that these efforts are just an expansion of previous direct mail voting efforts. Its important to understand two things, first that this is already a type of voter targeting beyond anything we have seen before, while arguably similar to prior efforts the scope and scale of what is being engaged in here is much larger. And two, perhaps more importantly is the realization that these voter tracking, identification and targeting, is only going to get more sophisticated, more complicated, and more powerful. In the same sense that Target profiling families to find pregnant mothers is an astronomical leap over the Mad Men days of creating a company message, this type of voter targeting is way beyond the types of campaigning we have seen in prior political campaigns.


One only has to look at the Obama campaigns strategy shift over the last four years to see how this has changed, just in that short time. In 2008 the Obama campaign used myBarackObama as the the center piece of voter organization. Now we could argue the degree to which this is true, but this platform was largely seen as a horizontal structure, connecting like minded voters, at times targeting them with data, but also limited in the type of targeting that it could do. But the Obama campaign made a significant choice to actually use Facebook this election cycle, promoting connecting with voters this way, rather than thru an independent source. Why? The answer lies in this page. (Taken from an article Micah Sifry wrote on this subject.)


This is the page you see when you sign up for the Barack Obama app on Facebook. Why would the campaign choose to leverage another platform, rather than use their own? Because Facebook allows them to harvest a whole range of data points that otherwise the campaign might not have access to. By connecting to the campaign thru Facebook you are giving them access to a bunch of data they wouldn’t otherwise normally have, of at least would have to go to great pains to get. I am not going to go into the details here, you should just read about it here. But as a way of summarizing the way Facebook is able to engineer decisions thru changing its platform, recently Facebook was able to increase Organ Donor registration by a factor of 23 . . 23! That kind of power is worth a great deal of money to the campaigns.

I’m not going to stage the long form of the argument here, but really what a chunk of these data scientists are trying to do is figure out a way to understand aggregate human behavior with an eye towards changing it, programming it if you will. Big Data is useful, but it can also be dangerous. Again just think carefully about the Target article and how they are looking to subtly manipulate and produce customer responses below the threshold of conscious decision making. In short as the Technology Review article summarizes, the goal is to make social science an engineering discipline. And you don’t have to believe this is 100% possible to start to get worried about this.

Why this Should Concern You

A good deal of work has already been done highlighting some of the issues here. I encourage you to read the work of Dan Kreiss (Kreiss was on the PDF panel) and Phillip Howard on this subject (here, here, & here). As Howard and Kreiss point out there are four reasons you should be concerned about this:


  1. Potential for Data Breach. These types of databases would store a tremendous amount of data, and currently the market in this data is a fairly unregulated free for all. Commercial database breaches have become somewhat commonplace, but the types of data these companies are storing and selling to campaigns contains more data, and more sensitive data. If we believe that freedom of assembly is fundamental to a democracy (read the 1st Amendment) we should also be concerned about creating an atmosphere where an individuals political and commercial associations are stored together, creating a sense that individuals are always having their political activity monitored. The extent to which this type of business is unregulated, is as Kreiss and Howard point out, disturbing to say the least. This type of data might be the most lax unregulated collection available. (For an interesting film on this question of data collection see Erasing David, although it doesn’t mention the issue of political data collection.)

  2. Economic Asymmetry. Already political campaigns are expensive endeavors. This type of data analysis is not cheap and if this becomes standard fair, the price of campaigning will increase, with data mining favoring the wealthier campaigns.

  3. Voter Disengagement. There is data to suggest that targeted campaigning hurts the general electorate by appealing to likely voters, rather than addressing the public as a whole, or targeting select influential groups. Think here of how voters in some states feel disenfranchised by the electoral college system (my vote in Texas doesn’t really count), where being in a “swing state” means voters are exposed to a much higher degree of political messaging. This type of campaigning would be that on steroids.

  4. Democratic Debate. One of the goals here is to narrowcast, target voters based on individual interests, rather than engage in a larger public debate to build consensus and conversation, in effect this works against the very public sphere ideal democracy rests upon. (To me this is the largest one, more on this in a minute.)

The fact that companies trying to maximize “data profiles” to sell products are often the same ones trying to sell us politicians should be troubling. Its one thing to be comfortable with the idea that companies track online behavior to effectively market to customers, its an entirely other to think about political organizations using cookies to track behavior and than use that data to “sell” ideas to the public.** **

Why This Should Scare the Crap out of You

While I think Howard’s and Kreiss’s work is a good place to start, it doesn’t even really scratch the surface of how bad this can (and I fear likely will) get without legislative influence to prevent it. The really dangerous piece of this is number four above, the degree to which this is likely to undermine public discussion about issues, replacing issue discussion with micro-targeted voter manipulation. Now I can already imagine some readers thinking manipulation here is to strong a word, overstating the case, but I don’t think it is, if anything it perhaps understates what the evolution of this could be.


The idea of a representative democracy is that individuals, informed by public discourse, make rational decisions about which candidates they would choose to have represent them, thus electing a set of representatives that reflect the collective interests of the public. But voter targeting and tracking itself isn’t about producing public discourse, engaging voters, or explaining policy positions, rather it is about socially engineering individuals to maximize the vote. In short the idea behind voter targeting is maximizing voter gain for a specific candidate, not maximizing discussion about a candidate or an upcoming election. Indeed discussion is precisely the opposite of what is being aimed at. Philosophically this type of voter “engagement” works by fragmenting the public not building a conversation. Instead of policy positions or statements, data collection becomes the new commodity for generating votes. On a minor level this can involve targeted pitching (selling different agendas to different interests groups) or another level, ignoring “content” all together aiming for other means to persuade manipulate voters. And there is no real way to opt out of this either.

Not buying into this as a problem, let me sketch a few scenarios, now at all far removed from what can be done, or might already being used.

-Manipulate presentation not message. Already the Obama campaign indicated that in the previous election cycle they did A/B testing on presentations of emails to maximize donor response. Even changing the color to see what is most effective. Candidates could easily track your favorite color and send you a piece of mail, or customize the online ads to increase click thru rate. Forget microtargetting messaging, they could microtarget “style” to make sure voters build a positive impression of the candidate. And don’t tell me this type of stuff doesn’t matter, that voters will chose to ignore presentation and instead vote on issues. We know from data that people are easily swayed by presentation, this stuff matters. And campaigns are already spending resources on it.

-Merge Credit Report with voter record. Already your credit score is used to determine a range of factors about you (higher credit score = better insurance rates). Credit scores also probably correlates to a range of factors politicians would be interested in, especially campaign donations. Or they can corelate donation requests with spending habits, knowing when to ask particular individuals in a way that will yield the greatest possible donation, figuring out when you have extra money to spend and sending the ask then.

-Health Care. Lets imagine that they can scrape Facebook, your search history, etc. to determine that you, a family member, or friend are terminally ill. Let’s even say they can figure out with a high degree of likely hood the particular illness. Now said politician or interest group can mail you/or serve you up an add that reflects how their candidate/interest group would better serve the terminally ill patient. Too creepy if you get a direct solicitation—data would allow the campaign to imbed the information in a range of other “points” so it seems natural that you got it. (As was done in the Target case where customers were too creeped out by getting pregnancy coupon booklets, so Target created booklets that “appeared” neutral but were really aimed at expecting families.)

-MicroTargetting Message. What is to prevent a candidate or campaign from sending one message to one group of voters, and the opposite message to another. Maybe sending conflicting messages might be risky, but tailoring language is probably effective. In the same way that evangelicals were targeted by using heavily coded language in political speeches that passed as “natural” to other voters, campaigns could tailor language to individual voters for maximum gain. In one sense this already happens, coal messages in West Virginia, immigration ones in Texas, but the degree here could be much sharper. Imagine a candidate using subtly racist language to label Obama a terrorist or Muslim to audiences already predisposed to that message without having to nationally distribute that message. The ability to create a “target customized pitch,” which fractures discourse rather than produces it, seems counter productive to public conversation.

-MicroTarget voters on non voting issues. If you sign up for the Obama app on Facebook the Obama campaign then has access to a shit load (that’s a technical term) of data about you and your friends. Imagine sending customized emails on people’s birthdays, anniversary’s to ask for donations etc.

-Friend Targeting. One of the ways this type of data would be efficiently used is figuring out which node in the network (i.e. which friend) is most influential in determining how people vote. So rather than target you individually, campaigns could target your friends to get you to switch your vote.

-Manipulating people below the level of perception. This is where it gets really scary, where campaigns could start treating the voting populace as something to be engineered not persuaded. Given the large amounts of data Facebook and trackers have on people it is already possible to determine, perhaps not yet with a high degree of accuracy, an individuals mental state based on their online activity, i.e.. is the person happy, sad, depressed. Now imagine crossing this with political messaging. What if you know that a depressed voter is easy to convince, or worse that a depressed voter is easily convinced not to vote for the opposition?

I think we all like to pretend that voters are rational, that when someone goes into the voting both and pulls the lever that they rationally control and make that decision, but I think we also pretty much know at this point that people aren’t rational and that the same way that Nike can convince you that their product is hip, or that Apple products are shiny, or that McDonald’s is the place you want to eat, politicians and political interests groups work to persuade voters on an irrational level. True this has always been going on, the issue though is that the tools for this type of voter manipulation have now drastically increased. Big data social scientists are rapidly moving to making social science an engineering discipline, to treat human subjects as a group of actors to be engineered. **And before you quickly dismiss this premiss realize that companies already spend billions on exactly this principle, engineering the public. **What happens when politics becomes a matter of engineering? We are moving there, and what is worse we are moving there with little or no regulation, with little or no discussion about the costs of this kind of shift. I am not saying that engaging the public through social media is bad, or even that campaigns should not be using data analysis. But what I am saying is that we don’t want an unregulated industry in this, and that absent regulation things are getting worse not better, making now the time to have a serious conversation about this, and demand transparency in how exactly these campaigns store and use data about us.