“Big Data” Defined

BinaryOn February 21st I moderated the inaugural webinar in a series of sessions about big ideas in market research.  The series is being co-produced and sponsored by Research Access and GreenBook (look for announcement of upcoming installments in the series on Research Access and GreenBook Blog soon). 

The first webinar was on the topic of “Big Data,” a term which is quite a buzzword in the market research community these days. I find people are generally pretty confused about what Big Data is and what tools are available for analyzing it.

We assembled an expert panel for this event, including:

I started the webinar by asking each of the expert panelists for their definition of Big Data.

Dana Stanley:  I’m going to ask each of our expert panelists a simple question. What is “Big Data?”

Steve Cohen: Big data is an interesting question. I’ve heard several definitions.  The first definition is what we call the three-V definition, which is the variety of data that you may get, is one of the Vs.  So that could be coming from social media.  It could be coming as clickstream information.  Could be coming from TV zapping and TV remote control information.  It could be coming even from the CERN Large Hadron Collider in Switzerland.  The second V is velocity, how quickly it comes, and the third is the volume of information.  That’s probably the most common definition.

If I can go to a second definition that I happen to like, it’s what I call the VAST definition, V-A-S-T, which stands for Variable Attributes Subjects, or people, and Time, where one or more of those are in the thousands, the tens of thousands, or even the millions.

But if I could give one more definition, this definition comes from the Berkeley AMP lab, where AMP stands for – A is Algorithms, Machines, and People.  And they basically say Big Data is any data set you have where the data is expensive to manage and hard to extract value from.  So you don’t have to have something like the CERN Large Hadron Collider which generates a petabyte of data every second. You don’t have to have a petabyte of data every second to have big data.  I’ll stop there and let somebody else chime in.

Romi Mahajan: I tend to think, as a marketer, in terms of metaphors.  When I think of Big Data, I think of the fabled Roman god Janus, who had the face of both the creator and the destroyer.  He was a two-faced god.  The creator; when I think about big data, I think about huge sets of data from which you can extract intelligence, you can make meaning, you can find wisdom.  And the destroyer; these tracts of data are so vast, so huge, and so impenetrable it might seem to the naked eye, that we can get caught into the analysis paralysis that comes as a function of having simply too much information to deal with.  So for me, Big Data is an opportunity for wisdom-making, but it’s also potentially a peril in terms of analysis paralysis.  So that’s the metaphor that I use to operate my view of Big Data.

Dana Stanley:  Lenny, how do you define Big Data?

Leonard Murphy: It’s hard to follow up on Romi and Steve there, Dana, so maybe I would turn to a slightly different context and say that big data is the future of how enterprises will be able to more effectively deliver information internally and value to customers.  That’s certainly the business context and, as we look at the definitions of having massive, massive data sets available via social media, via CRM, via the passive applications with mobile and point of sale information, et cetera, Big Data is the process whereby we aggregate that information, extract information out of it, and look at value to clients and to consumers.

Charlie Wardell: So for me, Big Data is a pretty simple term.  It’s an overused term.  There are about two and a half quintillion bytes of data being generated daily.  If you look at the real hardcore definition of what big data is, you’re probably looking at petabytes to exabytes of data.  But I’m a more practical kind of guy.  I think it’s anything south.  I’ve had clients that had big data problems if that were less than a terabyte.  It really depends on what your capabilities are and what your need is.  There is an aspect of Big Data that is not really being addressed – it was touched on earlier – which is the velocity of the data, the speed at which it comes in but, moreover, the speed at which you can process that data.  So there’s a whole new angle to Big Data that is starting to emerge that’s very important.  Most of the Big Data solutions out there attempt to accomplish your analytics and your insights through batch-based processing.

If you think about it, that is good to a point, but there is a need for real time because the Web is real time and insights need to be real time.  So there’s a new aspect of Big Data which I believe strongly is related to looking at the data in real time.  But for me, big data, in a practical sense, is anything that’s not manageable by traditional technology, relational databases like Oracle or SQL Server or MySQL.  It could be a variety of data from pretext to structured text to binary data to YouTube videos.  As long as it has bits and bytes, it’s data.  And if it’s not able to be handled in a conventional means in a real-time fashion, for me it falls into a Big Data category.

Note: A special “thank you” goes out to Focus Forward for transcribing the webinar.

About Dana Stanley

Dana is the Editor-in-Chief of Research Access.


  1. […] a great success. Dana Stanley of Research Access has posted a series of excerpts from that webinar here. Today Diane Liebenson, Publisher of GreenBook gives a great summary of the whole conversation. If […]

Speak Your Mind