Disclosure Risk

TL;DR

Risk that confidential or sensitive information is revealed inadvertently when collecting, processing, or releasing data.
Can occur through direct identifiers (e.g., names and addresses) or by linking unique combinations of variables.
Mitigation includes anonymizing data, using statistical masking techniques, and restricting data access.

Definition

Disclosure risk refers to the risk that confidential or sensitive information may be unintentionally revealed during the process of data collection and analysis.

Explanation

Disclosure risk arises when data about individuals can be exposed either directly (through identifiable fields) or indirectly (through combinations of non-identifying variables that together identify someone). Consequences of disclosure may include embarrassment or discrimination for the affected individuals. Reducing disclosure risk requires deliberate actions by data collectors and researchers to protect confidentiality and privacy throughout data handling and release.

Examples

Release of personal information

If a dataset includes personal information such as names and addresses and is released without proper protections, individuals can be identified. For example, a study that collects income levels of individuals in a particular city could reveal who earns specific incomes if the data are not properly anonymized.

Identification via a combination of variables

Even without direct identifiers, a unique combination of variables can identify an individual. For example, a study that collects height and weight of individuals in a particular city might allow someone to identify an individual if their height and weight combination is unique, potentially leading to embarrassment or discrimination.

Notes or pitfalls

Common mitigation measures described in the source include anonymizing data before release, applying statistical techniques to mask individual identities, and limiting access to data to those with a legitimate need.
Researchers should assess potential implications and risks to individuals before collecting and analyzing data to help reduce disclosure risk.

Anonymizing data
Statistical techniques to mask the identity of individuals
Limiting access to data (data access controls)