We’re counting down our Top 10 blog posts of 2013. Coming in at #4 is this post originally published July 31.
Over the last couple years, I’ve been looking at Twitter’s potential in survey research. Why Twitter? Because it’s vast, it’s fast, and it’s cheap. Recently, at the 2013 FedCASIC Workshops, I presented ten things survey researchers should know about Twitter.
1. Twitter is like a giant opt-in survey with one question.
Twitter started in 2006 with a simple prompt for its users: “what are you doing?” From a survey methodologist’s perspective, this isn’t really optimal question design. How people actually use Twitter is so varied, there might as well be no question at all. We aren’t used to working with answers to a question no one asked, and Twitter is a good example of what has been described as “organic” data – it just appears without our having designed for it. Tweets are limited to 140 characters in length. Pretty short, but a Tweet can capture a lot of information, and include links to other websites, photos, videos, and conversations.
2. Twitter is massive.
Every day, half a billion Tweets are posted. Half a billion! That means by the time you finish reading this, there will be approximately one million new Tweets. And the pace is only growing. With Twitter’s application programming interface (API) you can pull from a random 1% of Tweets. To get at all Tweets, or the Firehose (100% of Tweets), you need to go through one of a few vendors and for a fee, though the Library of Congress is working on providing access in the future.
3. Twitter is increasingly popular on mobile devices like smartphones and tablets.
You’ll see people tweeting at events, as news is happening right in front of them, or where you don’t really expect or want to see them tweeting, like while they’re driving. Many use Twitter on mobile devices with another screen on at the same time. That’s called multiscreening. Like when people tweet while watching television in a backchannel discussion with friends and fans of their favorite shows.
4. The user-base is large, but it doesn’t exactly reflect the general population.
It would be kind of weird it if did, honestly. There are surely many factors that influence the likelihood of adoption and wouldn’t it be surprising if we saw no differences by demographics? The Pew Research Center estimates 16% of online Americans now use Twitter, and about half of those do so on a typical day. Users are younger, more urban, and disproportionately black non-Hispanic compared to the general population. This is interesting when thinking about new approaches for sometimes hard-to-reach populations.
5. It is made up of more than just people.
Twitter is not cleanly defined with one account per person or even just one person behind every account. Some people have multiple accounts and some accounts are inactive. Groups and organizations use Twitter to promote products and inform followers. They can purchase “promoted Tweets” that show up in users’ streams like a commercial. And watch out for robots! Some software applications run automated tasks to query or Retweet content making it extra challenging when trying to interpret the data.
6. There are research applications beyond trying to supplant survey estimates.
Think about the survey lifecycle and where there may be needs for a large, cheap, timely source of data on behaviors and opinions or a standing network of users to provide information. In the design phase of a survey, can we use Twitter to help identify items to include? Can we identify and recruit subjects for a study using Twitter? How about a diary study when we need a more continuous data collection and want to let people work with a system they know instead of trying to train them to do something unfamiliar? Can Twitter be used to disseminate study results? What about network analysis? Is there information that can be gleaned from someone’s network of friends and followers, or the spread of tweets from one (or few) users to many? We often think of public opinion as characterizing sentiment at a specific place and time, but are there insights to be had from Twitter on opinion formation and influence?
7. Twitter is cheap and fast, but making sense of it may not be.
What’s the unit of analysis? Can we apply or adapt the total survey error framework when looking at Twitter? What does it mean when someone tweets as opposed to gives a response in a survey? Beyond demographics, how do Twitter users differ from other populations? How can we account for Twitter’s exponential growth when analyzing the data? The best answer to each right now is “it depends” or “more research is needed.” We need a more solid understanding and some common metrics as we look to use Twitter for research. Work on this front is beginning but has a long way to go.
8. Naïve and general text mining methods for tweets can be severely lacking in quality.
The brevity of tweets, inclusion of misnomers, misspellings, slang, and sarcasm make sentiment analysis a real challenge. We’ve found the off-the-shelf systems pretty bad and inconsistent when coding sentiment on tweets. If you’re going to do automated sentiment analysis, be sure to account for nuances of your topic or population as much as possible and have a human coding component for validation. One approach we’ve found to be promising is to use crowdsourcing for human coding of tweet content.
9. Beware of the curse of Big Data and the file cabinet effect.
Searching for patterns in trillions of data points, you’re bound to find coincidences with no predictive power or that can’t be replicated. The file cabinet effect is when researchers publish exciting results about Twitter but hide away their null or negative findings.
10. Surveys aren’t perfect either.
Surveys are getting harder to complete with issues like declining response rates and reduced landline coverage. Twitter isn’t a fix-all but it may be able to fill some gaps. It’ll take some focused study and creative thinking to get there.
Joe Murphy is an RTI International survey methodologist investigating the role of new technologies and social media in the collection and analysis of social data. His training as a demographer led him into the world of survey research where he has led projects and published on topics including hospital quality, substance use and mental health, uncounted ballots in the 2000 presidential election, and the effects of exposure to the events of 9/11. He writes frequently for Survey Post.