Baking Humanity Into Your Data

Thought Leadership

By Enterprise Team |

02.17.21

Data seemed to betray us in 2020, most notably in important arenas including healthcare, science, and government. This article explores some of the reasons behind data mishaps and recommends remedies every healthcare marketer can incorporate into their research protocols.

Alleged Data Betrayals

How did data lead us astray in 2020?

In public health, data concerning COVID-19 was both misinterpreted and mis-prioritized. In September, multiple parties used CDC data to claim that only 6% of deaths linked to COVID were attributable to the virus, when in reality COVID was indeed the underlying cause of death in 94% of cases.

Policy makers pinned their focus on the overall COVID mortality rate to drive public health measures to prevent COVID infections. Many now believe they consequently under-protected some highly vulnerable segments of the population and overprotected others. Had they acted on the data at a population-segment level, they might have prevented more deaths and had a less deleterious impact on the economy.

In politics, many observers are still wondering how the 2020 election pollsters could have gotten the electorate’s intentions so wrong. How could that many Trump voters come out of the woodwork on Election Day?

In healthcare, artificial intelligence (AI) and machine learning can now diagnose skin cancer, detect a stroke on a CT scan, review colonoscopy images for cancers, and help manage coronavirus patients in highly strained hospitals. Unfortunately, even this sophisticated computer innovation is subject to human biases in the data used to “train” the software. Data scientists are asking seminal questions about how they can root that bias out to ensure healthcare remains fair and equitable.

As marketers operating in a world with increasing misinformation, we like to think that our marketing research data gets to the truth. But we’re in the same boat as the pollsters, the policy makers and the medical scientists. When we’re measuring human attitudes, intentions or reactions, we can’t get close to the truth without accounting for human vicissitudes in research design, numerous subject biases both conscious and unconscious, and faulty data interpretation. It’s humanity, not mathematics we must blame. Let’s look at these mortal undoings and how we can mitigate them.

Complex Responses — Subject Biases

Research often involves asking people questions either directly or through a survey — a trickier proposition than you might think. Those of us who conduct consumer research have often seen people say one thing in a research forum or in an online survey, yet say differently in another. This particularly applies to medical patients, one of the most complex of human subjects.

If a voter’s response to a pollster’s question is subject to her concerns about what the pollster might think of her choice for office, imagine the forces at work when a patient is asked about her health behaviors by a researcher or her own doctor. Has she been completely compliant in taking her medication? Has she been eating or exercising in accordance with her healthcare provider’s advice?

A study covered by Science News indicates that 60% to 80% of people have not been forthcoming with their doctors about their health behaviors. Patients explained that they were fearful of being judged or lectured, or wanted to avoid embarrassment. Others sought to please their doctor or be viewed as compliant and responsible. This effect is called “social desirability bias.” This example also has hints of “demand characteristic” bias, in which a respondent provides the answer she thinks is wanted. These biases, as well as several others, can occur even on anonymous digital surveys.

Human psychology also affects studied behaviors. Human research subjects often modify an aspect of their behavior in response to their awareness of being observed. This is called the “Hawthorne effect” and can occur either consciously or subconsciously.

Experimenter Biases

You, yes you, as the marketer may introduce what’s called “experimenter bias” into your research. Your own expectations or preconceived beliefs may taint the project’s structure, process, or final interpretation. But forgive yourself — this bias is almost always unintentional.

Other data pitfalls include information bias, also called measurement bias. This arises when key study variables (for example, exposure to a marketing test factor or a therapy) are inaccurately measured or classified. And then there’s selection bias — a data distortion caused by a sample selection that does not accurately reflect the target population.

The internal validity of a medical or marketing study depends greatly on the extent to which biases have been accounted for and necessary steps taken to diminish their impact. In a poor-quality study, bias may be the primary reason the results are or are not “significant” statistically! Bias may preclude finding a true effect; it may lead to an inaccurate estimate (underestimate or overestimate) of the true association between exposure and an outcome. So, when we’re testing marketing variables, we’re also at risk.

How to Bake Humanity Into Your Data

How can we possibly mitigate the effects of bias, observed behavior, and the anonymity of the internet in our own marketing investigations? Here are some strategies to shore up your data.

Respect the science

While it is tempting to take shortcuts by designing your own research or even writing the questionnaires or surveys to save time or money, resist!Even seasoned marketers don’t have the design expertise to mitigate all the biases and research vulnerabilities that threaten your results. Respect the complex science of acquiring actionable truths via research and let the experts do their thing. They will address all points in the process where bias can creep in, wield bias-busting techniques, and manage for language and cultural differences in your target audience population.

Diversify your data

There is no better insurance policy in the data game than diversification of sources and methodologies. The best marketing researchers just love it when data sources corroborate one another. If findings across sources are contradictive, it signals something may be awry, and more or different data may be needed to draw solid conclusions.

Dodge the setting trap

Social media research analyzes social platform data to understand how audiences relate to topics, events, and other stimuli. Social researchers use social listening and audience intelligence tools, as well as advanced data extraction techniques. They often aggregate data from social media platforms, web forums, news, and blogs.

Sounds wonderful, right? Large data sets reflecting people acting routinely — hooray!

But beware. There are booby traps here as well. Depending on the social media setting, people can behave differently. For example, Twitter can be a means of unabashed self-expression, often with a hostile edge. But the same fierce Twitter human engaged in an online forum with fellow hobby enthusiasts or disease peers on another platform may exhibit empathy and foster fellowship. Which reveals the real person? Both. Don’t throw any data set out or crown it king before understanding the setting. This principle also applies to other research domains and settings.

Throw it out

Inoculate your data against the Hawthorne effect by throwing out the first waves of data in your study, when subjects are most likely to “remember” they are being studied. Over time, they forget. Some of the most successful research right now uses digital behavioral data generated by subjects over weeks or months. (See next bullet.)

Consider continual collection of “digital life” data

Some of the most revolutionary studies focused on understanding and detecting subject behavior and intent use data collected from individual’s digital communications and actions over time.

Subjects opt in to these studies. They generate data passively and unconsciously as they go about their digital lives. This makes for highly accurate data.

Passive digital data collection is slated to grow as a research technique. In best practice, data is collected from large numbers of people and analyzed by interest (as many as 100,000 subjects). The data is aggregated; no individual is ever identified and individual privacy issues are nearly eliminated. Brands should explore the viability of this innovation to meet strategic learning objectives on their docket.

Monitor benchmarks and probe historical data

Get a sense of what your data might look like before you even begin research by reviewing other available research and data in the same area. If there is historical data to review, or experts who will share benchmarks with you, all the better. Continuously monitor benchmarks and establish your own so that you can recognize a potential problem more quickly.

Go big

Large data sets offer higher levels of confidence and tend to blunt (but not remove) the effects of bias. If you can’t afford a large study, look to share costs with another party that is interested in the same insights you are. Even large companies take this approach to research. You can also use secondary, open-access data where appropriate.

Prioritize your learning objectives

Research data can fall down because investigators have too many objectives. Limiting the hypotheses you wish to test and the populations you engage can allow you to better focus on mitigating bias and other factors that undermine data accuracy.

Become a perpetual student of bias and behavior

Over time, build your own sense of what data is random, an anomaly, or systemic. Develop some healthy cynicism and probe what you suspect. Pew did just that with the 2020 election data. Their probe offered this insight into the polling errors last year:

“Historically, polls tend to be highly accurate when measuring public attitudes, but less accurate when measuring public behaviors. The national polling error of 2020 appears to be similar to the average errors for election polls over the past 12 presidential elections. The fact that the polling errors were not random, and that they almost uniformly involved underestimates of Republican rather than Democratic performance, points to a systematic cause or set of causes.”

Data has immense power to help us solve problems, but we must understand how to protect its integrity. Here’s to ingenious ways to use and safeguard data for a better year in 2021!

Sources: Forbes, Harvard Business Review, Becker’s Hospital Review, USA Today, CMI, CDC, Pew, Ipsos, Science News, Symphony, Relative Insight, PMC

This article is also featured on PharmaLive.