This information is useful for people who use panel sample for online surveys, and who want to make sure their survey data is truly clean.
Online Survey Panels Tell Us Their Panelists Are Clean
It’s hard to open a marketing magazine without seeing an ad from an online survey panel company proclaiming how clean and high quality their panel is. A few years ago, this claim was a big deal – it was the Wild West of online survey panels, and buyers of sample had to be very careful as to who they worked with. Today, however, most major online survey sample companies have adopted measures to get rid of professional respondents, prevent over-surveying, and make sure that respondents are who they say they are. So, whether the sample is “true” or “pure”, or there’s “attention to detail”, most reputable panel companies are doing a decent job of giving those of us who field surveys a good product.
But Survey Data is Still Dirty
However, and here’s a big however, the data from most online surveys using panel sample still comes in with some dirty responses. My research shows that between 1 and 5% of survey data from panel sample is garbage. Garbage – throw it out; don’t bring it into your final dataset to analyze. Sure, one can blame some of these dirty responses on frustrated respondents dealing with poor survey writing (bad questions, too long, etc.), but the fact remains that you had better clean that survey data before it goes in for analysis.
So, How Do I Clean the Data?
Here’s a plan you can use to clean your data.
When I say “flag” below, I mean that you create a new variable in your dataset next to the variable you are examining, and you place a “1” in a cell if the respondent’s case is flagged.
- Flag speeders. Look at time to completion and flag those respondents who took the survey in an unrealistically short time. Check the median time to completion and establish rules that you feel comfortable with – I often flag those taking <1/3 of median time with a “1″ (“speeder”), and those taking < 1/4 of the median time with a “2″ (“super speeder”). You might consider removing outliers (at the slow end) before calculating your median.
- Flag straightliners. If you having any grid/matrix questions, flag those respondents who gave the same response to every item (unless it makes sense that they could do so).
- Flag gibberish or garbage responses. If you have any open-ended responses, look for text such as “asdf” or “…..”; flag these responses, and any other “colorful, yet meaningless” responses you find.
- Flag incongruent combinations. If a respondent says their company size is 1000 and the number of PCs in the company is 5, something’s wrong here. Flag it.
- Trap questions. Did you include any questions such as “Please choose the third response below”, or “Please type the word “attention” below”? If you did, check them, and flag those respondents who didn’t follow the directions.
- Sum up your flags. Compute a new variable that sums all the flags.
- Sort your dataset by summed variable. Bring cases to the top that have suspicious answers on a number of your checks.
- Inspect and delete cases with flags. Delete those cases that are too “dirty” to be included. Review with key stakeholders to agree on deletions.
- Notify your vendor of any bogus respondents. All the vendors I work with do not charge for any respondents I have flagged for deletion. Show them the IDs of the respondents you threw out, and they’ll take action on their side to warn and/or remove these panelists from their database.
Following the steps above will insure that the data you analyze is as clean as possible. Yes, it takes a bit of time, but the effort is clearly worth it when compared to making decisions based on the analysis of data that includes bogus responses.
One last note: if you really need your final sample size to hit a specific number, and you can’t go below that number, you can over-sample, in anticipation of throwing out some respondents.
Feel free to contact me for more details about some of the specific techniques I have found useful to clean data, or follow me on Twitter @NicoPeruzziPhD to hear about other marketing research topics.