Here are the three sorts:
1. The Data.
The data is the data, that is the actually numbers, codes or open ended text that the respondent enters into the survey. There should probably be a better way of describing this than “the data.”, maybe raw data is a better term to use ?
2. The Metadata
Metadata describes the raw data.
For instance the raw data for question 2 may be a “1” or a “2”. The metadata would say that the value “1” means male and the value “2” means female. Or the metadata may say that the values for question 3 could be a range of 1-100, or 32-78.
Metadata gives meaning to the raw data, and so it is vital to the analysis process of the raw data that the metadata is present. Otherwise the raw data is just a collection of numbers with no meaning.
One of the problems with metadata is keeping it connected to the right raw data. The wrong metadata with raw data can be a disaster.
Raw data with no metadata is just a load of junk.
3. The Paradata
Paradata is the least well known of the data triplets. In the past decade or so it has become much more important for the survey research world.
Paradata is data which describes something about the way the raw data was collected.
It is data about data.
The most commonly used form of paradata used at the moment is data about questionnaire and question timings. That is, the time a respondent takes to complete a question or questionnaire.
This type of data is now one of the cornerstones of quality measurements for web surveys.
Obviously there can be many different sorts of paradata. For open ended text questions the length of text entered by the respondent can be measure, as well as the “level of vocabulary” contained in the text.
One metric used for web surveys is that of “speeders,” that is, the number of people who complete the survey extremely quickly. The paradata for time take to complete the questionnaire is used here.
Paradata can also be useful in revealing hidden biases; for instance, using paradata in the gamificaton of surveys is a rising trend. The time taken to do something in a gamified survey as well the action can have a great deal of meaning. Some researchers claim that hidden racism, some times unknown to the subject themselves, can be revealed by measuring someone’s reaction time to specific questions.
In a future post we will delve more into exactly how paradata can be used for quality control of web surveys.