Search the site...

  phil mora
  • The Big Picture
  • Butchsonic Forge
  • About
  • The Big Picture
  • Butchsonic Forge
  • About


The Big Picture
​
San-Francisco. Philadelphia. Paris. Denver. 

About

Twitter IPO and the pitfalls of using social data in big data analysis

7/30/2013

0 Comments

 
Picture
Picture
Picture
Picture
New research suggests using big data, particularly social media data, can lead to a biased representation of the data based on societal factors.

Striking new research out of Princeton University’s Center for Information Technology Policy and the University of North Carolina at Chapel Hill suggests that inferences based on how people use social media platforms like Twitter and Facebook should be reconsidered. The reason? These platforms represent skewed samples from which it is difficult to draw accurate conclusions.

[ Thank you MIT Sloan Management Review]
[ By Renee Boucher Ferguson | 07.17.13 ]


In her draft paper, Big Data: Pitfalls, Methods and Concepts for an Emergent Field, UNC professor and Princeton CITP fellow Zeynep Tufekci (@zeynep) compares the methodological challenges of developing socially-based big data insights using Twitter to biological testing on Drosophila flies, better known as fruit flies. Drosophila flies are usually chosen because they’re relatively easy to use in lab settings, easy to breed, have rapid and “stereotypical” life cycles, and the adults are pretty small. The problem? They’re not necessarily representative of non-lab (read: real-life) scenarios. Tufekci posits that the dominance of Twitter as the “model organism” for social media in big data analyses similarly skews analysis:
Each social media platform carries with it certain affordances which structure its social norms and interactions and may not be representative of other social media platforms, or general human social behavior …

Twitter is used by about 10% of the U.S. population, which is certainly far, far from a representative sample. While Facebook has a wider diffusion rate, its rates of use are structured by race, gender, class and other factors and are not representative. Using these sources as “big data” model organisms raises important questions of representation and visibility as demographic or social groups may have different behavior — online and offline — and may not be fully represented or even sampled via current methods.
Tufekci says that one of the biggest methodological dangers of big data analysis is “insufficient understanding of the underlying samples.” In her words,

It’s not enough to understand how many people have “liked” a Facebook status updated, clicked on a link, or “retweeted” a message, without having a sense of how many people saw and chose to — or not to — take that option. That kind of normalization is rarely done, or may even be actively decided against because the results start appearing more complex or more trivial.

On the conceptual side of the big data analysis challenge, Tufekci posits that more in-depth research needs to be done in order to deepen the understanding of exactly what a social media footprints mean — and what can legitimately be inferred from big data analysis of those footprints.

A case in point: while retweets or mentions are often equated as a measure of “influence,” the meaning of a retweet could actually be something far different than influence, ranging from “affirmation to denunciation to sarcasm to approval to disgust.”

Tufekci makes three additional points regarding conceptual analysis of big data that can be applied in a business setting:

  • All networks don’t operate the same way.
Are social media networks similar to airline networks? Methodologies need to rely on more than “they’re both networks” as a basis of comparison; it’s crucial to examine the specific properties of nodes, edges, connectivity, flow, interaction and structure in different networks to understand which methods can be carried over from one type of network to another.

  • Humans do not interact only in networks.
Human social information flows do not occur only through node-to-node networks, but also through field effects — large-scale societal events that impact a large group … through changes within whole social, cultural and political fields — that must be taken into consideration.

  • You name it, humans will game it.
People will create false hashtag trends. They will ‘subtweet” as a way of talking about a topic or person and deliberately misspell something, or leave out the @ sign, in order to not be visible in a measurable way. They will game algorithms and metrics. This should be expected in all analysis.

When I asked Tufekci how she thinks her research applies to business managers using online and social media data, she said it’s important to keep in mind that more data does not necessarily mean more insight.

“A lot of big data research is done in an isolated, one-shot, single-method manner with no way to assess, interpret or contextualize the findings,” she said. “There is great potential for error and misunderstanding; worse, with a lot of money flowing into this space, there is a lot pressure to produce “results” and overlook the fact that methods that were not developed to study humans, and do not necessarily work the same way, but are being applied widely.

“The online imprints that create these large, aggregate datasets are not just mere ‘mirrors’ of human activity; rather, they are partial, filtered, distorted and complex reflections.”

More Reading: http://sloanreview.mit.edu/big-ideas/data-analytics/

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Product Builder in Colorado. travel 🚀 work 🌵 weights 🍔 music 💪🏻 rocky mountains, tech and dogs 🐾

    Picture

    Categories

    All
    Change Agents
    Experiences
    Fitness
    Hacking Work
    Projects
    Technology
    Thoughts

    Archives

    May 2025
    April 2025
    March 2025
    February 2025
    July 2024
    June 2024
    December 2022
    November 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    January 2016
    October 2015
    August 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    February 2013
    January 2013
    December 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    January 2012
    December 2011
    October 2011
    September 2011
    August 2011
    June 2011
    May 2011
    April 2011
    March 2011
    February 2011
    January 2011
    December 2010
    November 2010
    October 2010
    September 2010
    August 2010
    July 2010

Phil Mora
​San Francisco .Rennes .Fort Collins .Philadelphia
Phone: (408) 242-9222 . [email protected] . Discord | X | Linked In


Copyright © 1999-2025 Topp Studio All Rights Reserved