Common Types of Data Bias (With Examples)

Office workers sitting at desks, analyzing reports that represent data bias.

Data biases affect the collection, analysis, and interpretation of data. Learn about 5 common types of data bias, and how you can avoid those pitfalls in your work.

Summary:

Data bias occurs when incomplete or inaccurate data fails to accurately reflect the overall population.
We use heuristics, or mental shortcuts, to quickly make sense of the world. Those shortcuts can become cognitive biases or skewed ways of thinking about the world around us. Those biases can affect how data is collected, analyzed, and interpreted.
5 common types of data bias include confirmation, historical, selection, survivorship, and availability biases.

We know that data is important to businesses of all sizes. It’s also essential to understand that we are human. Therefore, our human biases can impact how we understand and respond to the world around us.

In the digital age, data is an essential part of how businesses understand their customers, develop new products, and respond to the market. However, our analyses and conclusions are only as good as the data that powers them. As humans, we are prone to cognitive biases that influence how we think about the world around us. When those cognitive biases impact how we collect and analyze data, data bias can occur.

Understanding what data bias is, diagnosing its causes, and learning how to recognize and avoid common types of cognitive biases is an essential first step in equitably using data.

What is data bias?

Data bias refers to data that is incomplete or inaccurate. These limitations then fail to paint an accurate picture of the population the data is supposed to represent. Data can represent anything like standardized test scores of college students, customer satisfaction feedback, or population health data.

Examples of data bias

Data bias can manifest in many different forms. One high-profile example of biased data is an AI-based candidate evaluation tool that Amazon developed in the mid-2010s. In 2018, the tool was scrapped because it had learned from data on past hiring decisions set to exclude women from the pool of qualified candidates.

Another example of data bias comes from Philadelphia’s SEPTA security system. When algorithms learn patterns in criminal behavior from datasets that reflect biases in crime, policing, or incarceration trends that disproportionately affect people of color, those algorithms may predict that people of color are more likely to be criminals. This could risk racial profiling and discrimination.

These are serious examples of the possible impacts of data bias. However, this illustrates the impact that the misuse or misinterpretation of data can have on regular people.

What are the types of data bias?

Humans have implicit and explicit biases. We have our cognition to thank for that. Researchers estimate that adults make 35,000 decisions a day, ranging from “who should I vote for?” to “what color socks should I wear?”

Humans need to think and react quickly, so over time our brains have evolved to take shortcuts that help us quickly reach conclusions based on information we’ve learned in the past.

These shortcuts are called heuristics, and they help our brains simplify information processing and reach decisions faster. However, these heuristics can give way to cognitive biases. Cognitive biases are systematic errors in how we respond to new information based on our past experiences.

Here are 5 common types of cognitive biases that can become data biases.

Confirmation Bias

Confirmation bias occurs when we favor information that confirms our existing attitudes and beliefs. This often happens subconsciously. Confirmation bias causes us to focus on information that supports our arguments or way of thinking.

Confirmation bias influences data analysis when we gather data or perform analyses in a way that unconsciously supports the original hypothesis.

For example, confirmation bias might manifest in classrooms. If a teacher believes that boys are naturally better than girls at math and science, that teacher may be more likely to call on boys to answer questions about those topics. This creates a self-fulfilling prophecy, supporting the teacher’s belief that girls are not as naturally talented at STEM subjects.

How to avoid confirmation bias:

Before beginning data collection, clearly state the research question, hypothesis, and goals of data analysis. Challenge and question the data at hand. It’s important to seek evidence that may contradict preconceptions. After the analysis is complete, evaluate the results and compare them to your original hypothesis and questions.

Historical Bias

Historical data bias occurs when systematic cultural prejudices and beliefs influence decisions. This can not only influence data that was collected in the past but also impact present-day data collection.

For example, research by the National Highway Transit Safety Administration (NHTSA) found women are 17% more likely than men to be killed in car crashes, despite being safer drivers. When NHTSA examined crash testing protocols, it found that car manufacturers either excluded female crash test dummies from their tests completely or only placed the female dummies in the passenger seat. Furthermore, the female crash test dummy represented the “smallest 5^th percentile of the female population”, and was more representative of a young teenager than the average adult woman.

This demonstrates selection bias because vehicle safety inspectors had systematically excluded female representation for decades.

Historical bias might occur when data:

Involves bias already (such as human discrimination and prejudice)
Is incorrect or incomplete
It no longer represents reality

Historical bias can make it difficult to train machine learning models. When AI learns from biased data, it will generate biased answers.

How to avoid historical bias:

Historical bias can be avoided by regularly auditing incoming data. Additionally, it’s important to ensure inclusivity is established within frameworks for underrepresented groups. We must acknowledge and identify biases in our historic and contemporary datasets.

Selection Bias

Selection bias is an error that occurs when the population samples do not accurately represent the entire target group or represent skewed insights. This means that the data is selected subjectively rather than objectively.

Selection bias can arise due to poor study design if the sample taken was too small, or the sample is simply not randomized.

An example of selection bias is a study on the health effects of alcohol on the general population. If researchers recruit participants exclusively from bars and nightclubs, they are selecting participants whose behavior may not represent the full population.

Here are three types of selection bias to keep in mind:

Sampling bias: Occurs when data collection is not randomized
Convergence bias: Occurs when data is not collected in a representative way
Participation bias: Occurs when participants voluntarily place themselves in groups, thereby skewing the results of those groups

How to avoid selection bias:

Address historical bias in data sources to increase inclusivity and seek opportunities to improve data models. Expanding samples and encouraging participation from diverse groups (when relevant) is important. Additionally, researchers should find opportunities to correct selection bias in ongoing and future research.

Survivorship Bias

Survivorship bias is a cognitive error that causes us to focus on data points that survive the selection process while ignoring data points that did not survive. This typically occurs due to a lack of visibility from counterparts.

An example of survivorship bias might involve focusing on stories of entrepreneurs who didn’t finish college, as a way of showing that a college degree doesn’t guarantee success. That excludes the population of people who didn’t complete college degrees and didn’t become successful entrepreneurs. It also excludes the population of successful businesspeople who did complete a college degree.

The two main ways data analysts reach conclusions through survivorship bias are:

Inferring causality: Believing that an outcome was directly caused by a specific variable when there was no direct relationship between them.
Inferring a norm: Believing that the data that survives represents a past norm, rather than looking at the data that did not survive over time.

How to avoid survivorship bias:

To avoid survivorship bias, we must be extremely selective with the data sources being utilized. Additionally, we must ensure the data sources we are using have not omitted observations or sets that no longer exist.

Availability Bias

Availability bias occurs because we tend to put more work into understanding and recalling information in our working and short-term memory.

For example, when plane crashes are in the news, airline passengers might overestimate the likelihood that their plane will crash. This is because frequently covered news topics are more accessible in our working memory.

How to avoid availability bias:

One way to overcome availability data bias is to seek opposing viewpoints and data that contradicts our existing beliefs and ideas. Although this information may be harder to find, it makes us less reliant on more recent and accessible information.

Impacts for businesses

Businesses of all sizes can and should interrogate possible bias in their data collection, analysis, and interpretation. This not only helps businesses adhere to data ethics principles, it will also make their data more accurate and reflective of the world.

Author

Pragmatic Editorial Team

The Pragmatic Editorial Team comprises a diverse team of writers, researchers, and subject matter experts. We are trained to share Pragmatic Institute’s insights and useful information to guide product, data, and design professionals on their career development journeys. Pragmatic Institute is the global leader in Product, Data, and Design training and certification programs for working professionals. Since 1993, we’ve issued over 250,000 product management and product marketing certifications to professionals at companies around the globe. For questions or inquiries, please contact [email protected].
View all posts

Most Recent

Article

The Data Incubator is Now Pragmatic Data

As of 2024, The Data Incubator is now Pragmatic Data! Explore Pragmatic Institute’s new offerings, learn about team training opportunities, and more.

Category: Data Science

An illustration of the number 10 surrounded by interconnected gears and network nodes, with a laptop displaying charts and data

Article

10 Technologies You Need To Build Your Data Pipeline

Many companies realize the benefit of analyzing their data. Yet, they face one major challenge. Moving massive amounts of data from a source to a destination system causes significant wait times and discrepancies. A data...

Category: Data Science Business Growth

An illustration of a screen with binary on it, a lightbulb, a target with an arrow hitting the center, and a clipboard with a checklist

Article

Which Machine Learning Language is better?

Python has become the go-to language for data science and machine learning because it offers a wide range of tools for building data pipelines, visualizing data, and creating interactive dashboards that are smart and intuitive. R is...

Category: Data Science

A short-haired figure surrounded by a speech bubble containing a bar graph and pencil, a speech bubble containing a pie chart, and a megaphone

Article

Data Storytelling

Become an adept communicator by using data storytelling to share insights and spark action within your organization.

Category: Data Science

An illustration of a brain-like cloud connected to a laptop, a mobile device, and an Internet globe icon

Article

AI Prompts for Data Scientists

Enhance your career with AI prompts for data scientists. We share 50 ways to automate routine tasks and get unique data insights.

Category: Data Science

Sign up to stay up to date on the latest industry best practices.

Sign up to received invites to upcoming webinars, updates on our recent podcast episodes and the latest on industry best practices.

Common Types of Data Bias (With Examples)

What is data bias?

Examples of data bias

What are the types of data bias?

Confirmation Bias

How to avoid confirmation bias:

Historical Bias

How to avoid historical bias:

Selection Bias

How to avoid selection bias:

Survivorship Bias

How to avoid survivorship bias:

Availability Bias

How to avoid availability bias:

Impacts for businesses

Author

Most Recent

The Data Incubator is Now Pragmatic Data

10 Technologies You Need To Build Your Data Pipeline

Which Machine Learning Language is better?

Data Storytelling

AI Prompts for Data Scientists

OTHER ArticleS

The Data Incubator is Now Pragmatic Data

10 Technologies You Need To Build Your Data Pipeline

Sign up to stay up to date on the latest industry best practices.

Subscribe

Subscribe