INSIGHT on INSIGHT: Controlling Errors in Analysis

Errors in analysis are dangerous because they can potentially go undetected.  Unlike design errors, almost all analysis errors can fortunately be relatively easily corrected once they are recognized.

These errors can surface due to inexperience, laziness or sloppiness.  Like respondent errors, analysis errors are rarely intentional, but circumstances like rushed timelines or lack of incentive to dig deeper or double check results can create an environment where they more easily occur.

Lack of quality control can also allow more errors to persist.  Over my 15 years of experiences, I’ve developed a very detailed process to eliminate every error in analysis.  However, these quality checkpoints will still periodically reveal mistakes that enter through the necessary manual manipulation of imperfect humans.  In practice, it seems impossible to engineer a flawless system for analysis. 

Regardless of how simple or complex your project is, find ways to include quality control steps and consider using the following checklist to make sure no hidden analysis errors are able to survive:

 

SOURCES OF ANALYSIS ERRORS

Poorly defined analysis groups:  Not defining a group properly or not understanding how a group is defined can have a huge impact on how that group’s information is interpreted and acted on.  Poor definitions can range from making groups too large (like calling everyone age 35 and older “middle age”), making groups too small (like looking at Hipsters living in rural markets) or not separating groups at proper points (like doing a poor job determining what defines a high-value shopper versus low-value shopper).

Forgetting or misinterpreting the definition of groups:  Without access to definitions, different people might make different assumptions about how a group labeled “middle age” or “middle income” is defined.  Different definitions could not only produce very different data and insights, but the meaning or implications of those insights could also dramatically change.

Mislabeling groups:  Depending on how data is processed and manipulated, there are many ways to assign the wrong label to the wrong group.  Quality control points should always exist to make sure labels are accurate.  This could be as simple as looking at data tabulations among males to confirm that the male group says 100% and the female group says 0%.  Labeling issues can also exist when testing similar concepts, tracking respondents through different legs of a survey or keeping interview comments associated with the correct individual that said them.

Overlooking analysis groups:  The biggest insight is often the one you never notice.  This can easily happen if the wrong groups are not included in analysis.  For example, there are many ways to study sources of growth for a category.  While current buyers and prospective buyers may be obvious groups, lapsed buyers could also provide significant insight related to how to attract and retain more people.

Not understanding scale points:  Purchase intent or preference scales may use a one- to five- point scale or a five to one-point scale.  Confusion with the direction of the scale can produce inversed results that may go unnoticed for a long time.  Recognizing that a scale starts at one and not zero can also help correctly interpret extreme values.

Miscalculating groups:  More complex analysis gets beyond discrete groups easily recognized from individual questions or data points, and starts to create groups defined by multiple dimensions or basic algorithms.  Creating these new groups is typically a very hands-on and sometimes iterative approach.  More keystrokes and more formulas involved in the calculations increases the risk that one of them is done wrong.

Cleaning up dirty data:  Most data sets require some amount of cleaning.  This may include removing incomplete records, correcting data labels, merging redundant entries, or recognizing aspects of the data that may require special handling (like diminishing the prominence of comments from certain participants due to bias or groupthink).

Basing insights on data with small base sizes:  It can be dangerous to focus too much on the comments of a single interview participant or including results from an analysis group with a small base size (less than 70).  While the insight may be accurate, the fact that it is based on such a limited group increases the risk that it is either inaccurate, represents more noise than signal or can’t be extrapolated to a larger population.

Messy data manipulation or presentation:  Analysts without a complete understanding of a project can make subjective decisions that could unintentional alter the data.  This can include truncating data, abbreviating data labels, making pie charts out of data that is not mutually exclusive (it doesn’t add up to 100%) or sorting/filtering results in a misleading manner.  None of this has to be done with ill intent, but it can still be just as damaging.

Relying on fuzzy math:  This most often occurs when two groups that are not mutually exclusive (they share duplicate data) are summed up or combined in a manner that double counts the duplicates.  An example would be seeing that 40% prefer blue and 30% prefer red in a ‘select all that apply’ question and concluding that 70% of preference can be addressed with a blue and red option (this ignores the overlap of people that preferred both blue and red).

 

Analysis errors can take otherwise accurate data and produce very inaccurate insight.  Fortunately, most analyses can be corrected within the raw data (assuming the errors are recognized).

Have you seen past projects guilty of allowing poor analysis error to creep in and alter results?  If so, it might be worth revisiting recent projects to see what opportunity exists to pull new insight out of existing data before investing in new research.