Use Data Imputation When Modularizing Mobile Surveys

mobile phone as puzzle

The Problem

The market research industry is at a crossroads as client demands push survey length longer while respondents increasingly take survey from smartphones (see Brian Jones’ previous blog on mobile market research trends and best practices). Many advanced analytical techniques require ~15-25 minutes (or more) worth of questioning to function, but many smartphone survey takers drop out after ~5-10 minutes worth of content. Personally, I’ve seen drop-out rates over 5x on smartphones compared to what I see from the same profile of people taking the exact same survey on a computer or a tablet (note: I don’t see much of a difference between computer and tablet survey-takers…it’s smartphone survey-takers that are different). This can happen even with the most mobile-optimized survey user experience possible (e.g., no banks of rating scales requiring horizontal scrolling, minimal text on each screen, no vertical scrolling, no images or watermarks).

This trajectory is unsustainable but most people I know—quite understandably—want it to just go away.

The bad news is: it won’t. As an industry, we need to adapt to this new reality, kicking and screaming.

The good news is: we’re actually well equipped to do this, and we’ve been forced to make major adaptations before. Many of the sampling approaches and analytical tools we’ve been using for a long time can be applied in new ways to new situations we will increasingly find ourselves in with mobile market research. The key for us is testing and learning which tools work for which situations; which don’t, and updating our “conventional wisdom” about research best practices to be relevant in an increasingly mobile and increasingly modularized research world.

The Experiment

With this in mind, CMB partnered with Research Now to self-sponsor a research-on-research study where we purposefully sampled blocks of smartphone survey-takers alongside computer survey-takers and modularized our research design so that different “nodes” of smartphone survey-takers only answers specific parts of the overall questionnaire. We then experimented with different approaches for imputing all the missing data we ended up with to see which worked and which didn’t.

The Topic: We examined the purchase journey for:

  1. (a) Recent tablet buyers in the U.S. and…
  2. (b) Recent hotel bookers (for personal travel) in the U.S.

For each category, we examined the original purchase triggers, how they became aware of different brands and options, research and evaluation, the final purchase decision, and the channel through which they ultimately purchased their tablet or booked their hotel.

Sampling & Weighting

Whenever doing a modularized survey design, it is important to balance the various smartphone survey-taker nodes so they are comparable to one another on the key dimensions that could impact their attitudes or behaviors. All the rules we have used for doing this in longitudinal tracker studies come into play here as well. For this study, we ensured that our four different respondent nodes (one group of respondents who took the full survey on a computer, and three separate groups of respondents who took parts of the same survey on smartphones) were identical to one another on the core demographics (age, gender, and household income).

We first fielded all the computer survey-takers who answered the entire questionnaire, doing census-balanced click-throughs so we had a reliable read on the true demographic composition of recent tablet buyers and personal hotel bookers. We then fielded the smartphone survey-takers so that each node matched the demographic composition of the computer survey-taker node. We used RAKE weighting to correct any fielding imperfections:


Each smartphone survey-taker answered a core set of questions that all respondents answered (e.g., screeners, which brands they were aware of, considered and ultimately purchased), then were routed to one purchase journey node to answer in detail (but they skipped the others).


Data Imputation

The real fun began once we got the data back. There were two primary techniques we compared for imputing data:

1)     Fully conditional imputation

  1. What it is: iterative Markov Chain Monte Carlo (MCMC) method
  2. How it works: for every variable with missing data, it creates a predictive model using all other available variables with real data as predictors.

2)     Hot decking

  1. Replaces missing values with values from a similar respondent
  2. Sorts the file by the key criteria to match (e.g., age, gender, income)
  3. Impose randomness to the sort with a random number (deck)
  4. For each respondent that has missing values (“recipient”), it looks to the next respondent that matches on all key (deck) measures (donor)
  5. If the 2 respondents match on the specified criteria: it fills the recipient’s missing data with the donor’s data


What We Learned

Join us on July 9 for a webinar where we will review lessons learned from this project and implications for modularizing surveys in an increasingly mobile world.  There is simply too much detail and good stuff to spill the beans all in one blog post. Suffice it to say for now that it was a learning journey for everyone involved (which was indeed our mission), and we are sharing everything we learned along the way to further the industry’s collective knowledge base of how to adapt to this sea change in consumer research participation.

Chris Neal leads CMB’s Tech Practice. He enjoys spending time with his two kids and rock climbing.


Speak Your Mind