Blog Posts

Nepa and data quality

There has been a lot of talk in the industry recently regarding poor data quality. High reversal rates (project deliveries that are deemed fraudulent) have been seen by many market research companies across the globe, and have affected consumer confidence. So what’s causing these reversals? The simple answer is cheaters: people taking part in these projects who try to cheat the system, usually for monetary gain. This can be by an individual, or professional survey cheaters with bots.

Here at Nepa we have combined a whole host of technical solutions and business policies to help discover and reconcile these fraudulent entries. This ensures that our data and insights are one of the best in the business, and helps our data suppliers fight the ever increasing trend of survey fraud.

What types of cheaters are there?

We have recently completed a deep-dive into the quality of our global data suppliers, and evaluated 7 key markers that could suggest a response being considered invalid:

Duplicates
This is simple. These are people or bots who repeatedly enter the same data. Knowing IP information and other digital footprint characteristics can also help identify these respondents. When using multiple supply sources, the risk for duplicates is higher and should be accounted for.

Speedracers
These are someone who is going through the questions far quicker than the average participant. This implies that they are not reading the questions fully and responding to them honestly. Nowadays, many survey takers are cautious about their survey pace (being aware of this marker) which makes speed racing alone a poor marker of invalidation.

Inconsistent repliers
These are people who give conflicting information throughout the survey. An easy way of identifying these is to ask them their age at the beginning, and then ask them their year of birth at the end to see if they match. This can be a powerful marker of bots.

Straightliners
These are easy to identify, as the consumer has simply used the same answer option throughout the survey.

Randomisers
Survey responses usually fit trends, so those that don’t appear to do so suggest the respondent has simply clicked random answers throughout. If applicable in survey, this can be a powerful method of identifying non-engaged respondents.

Open-enders
Sometimes respondents give non-serious answers for open ended questions. These can be analysed automatically in surveys, although the most powerful control comes from manual inspection. When a survey contains brand type of open-ended questions, the manual coding and correction of brand spelling is imperative.

Third-partiers
Reputable outside companies can be used to assess the data, and give recommendations for those they believe are cheaters. They use AI and proprietary algorithms to take into account all of the points above, and more.

Once identified as fraudulent, we go back to our suppliers with the information, so they can be removed from the panels. This ensures that neither you nor Nepa are paying for dirty data.

With suggestions that 5-15% of market research data might be from cheaters failing more than one of above markers, how do you ensure that your research is giving you usable insights?

How to counteract fraud

Identify

There’s a large toolbox that can be used to identify cheaters in your market research. Some of those we use here at Nepa include:

Automated statistical analysis
Applying specific statistical techniques, preferably built into the survey flow, can aid in identifying cheaters. Working out the median length of interview and removing those that are far quicker is one to target speedsters. Maxdiff analysis can identify those responses that are more random than the average. Variance of grid question responses gives insights of straight-lining behaviour.

Manual feedback
Although they are more resource heavy than statistical analysis, setting up manual flags for questionable data is a great way of weeding out responses that analysis may miss.

3rd party software
Trusted third parties can be used to automatically identify dupes, fraudsters, bots, and survey farms. Their proprietary algorithms are great at recognising professional survey cheaters in particular.

CAPTCHA check
A CAPTCHA test is designed to determine if an online user is really a human and not a bot. It is easy for humans to solve, but hard for bots and other malicious software to figure out.

Counteract

Understanding your suppliers, your markets, and your data is key to counteract fraud. Automated reconciliation processes will support data suppliers in their work on keeping their online panels of high quality and may help turn the trend of increased fraud. Our recent deep-dive into our suppliers has shown us that samples in our different regions, suppliers, and industries all differ due to location specific variations. By understanding these differences, it is easy for us here at Nepa to add adjustments that negate these issues and ensure that your insights are as clean as possible. 

“Nepa’s focus on data quality continues to be a core of our business, in order for us to be able to deliver reliable insights to our clients. We ensure sampling consistency in combination with valid and engaged respondents, by well-established end-to-end processes and a toolbox of methodologies.”

Fredrik Olsson, Head of Data Procurement

For more information, please contact hello@nepa.com.