On March 19, 2015, the esteemed Annie Pettit of Peanut Labs led us through a brief but insightful presentation on data quality lessons from the survey design perspective. As researchers who are actively engaged in survey implementation, we want to ensure that the responses we are collecting are as accurate as possible. This requires a keen eye for data quality.
Let’s look at a few QC (Quality Control) questions we can easily work into our surveys.
- Red herrings can take the form of single selects, multi-selects or rating scales. The key to implementing this question is to drop in a few fake names into a brand list for example. Annie stresses that you should consider only those respondents that select two or more of the fake names for deletion. This technique also requires the researcher to due diligence via the search engine of choice in order to confirm the fake names are really fake or extremely low incidence.
- The high-incidence multi-select is another question type for consideration: a multi-select with a long list of common or high incidence behaviors. Flag those respondents that do not select several of the behaviors. In fact you can calculate the average number of responses selected and then flag those that underclick by some factor less than the average.
- The low incidence multi-select is its converse. Populate the categories with behaviors, such as rare illnesses, where respondents that select more than one of the behaviors are flagged. This will capture respondents who are trying to avoid being screened out in order to qualify for the incentive. The example below leverages rare medical conditions.
Next up are behaviors to monitor:
- Over-clicking on multiple response questions is a potential concern. For example, if you see the average number of items selected is 7 then you may wish to flag those respondents that select all or close to all of the possible responses.
- Failing to follow instructions should be flagged. A common example using our close friend the multi-select is to instruct respondents to select two or three responses only. Instruct your programming team to not validate this number, as we want to track respondent’s real, not forced, behavior.
- Overusing NA/DK is a concern. We can flag respondents who overuse the “Don’t Know” option. While good questionnaire design includes a don’t know option, participants who liberally use this option clearly do not understand the nature of the topic you are surveying on.
- Straight lining is a common tactic by respondents who tire of a lengthy grid question or series of such questions. We should include a few negative statements in the grid, randomly allocated, in order to pause the respondents and give them reason to think about their responses. We should also limit the number of items in the grid to as few as possible in order to answer the overarching research question with less strain on the respondent.
The key takeaway is that respondents’ natural behavior when answering a survey can be the best indicator of quality. Use multiple QC questions throughout the survey in order to flag respondents. We can flag respondents post-facto instead of weeding them out while they are taking the survey itself.