Data Masking is a process that need to provide confidentiality protection for data in cloud environments is a serious concern for organizations.
The ability to use encryption is not always a realistic option for various reasons including performance, cost, and technical abilities.
As a result, additional mechanisms need to be employed to ensure that data confidentiality can be achieved. Masking, obfuscation, anonymization, and tokenization can be used in this regard.
Data masking, or data obfuscation, is the process of hiding, replacing, or omitting sensitive information from a specific data set.
Data masking is typically used to protect specific data sets such as PII or commercially sensitive data or to comply with certain regulations such as HIPAA or PCI DSS.
Data masking or obfuscation is also widely used for test platforms where suitable test data is not available.
Both techniques are typically applied when migrating tests or development environments to the cloud or when protecting production environments from threats such as data exposure by insiders or outsiders
Data Masking Common approaches
- Random substitution: The value is replaced (or appended) with a random value.
- Algorithmic substitution: The value is replaced (or appended) with an algorithm-generated value. (This typically allows for two way substitution.)
- Shuffle: This shuffles different values from the data set. It is usually from the same column.
- Masking: This uses specific characters to hide certain parts of the data. It usually applies to credit card data formats: XXXX XXXX XX65 5432.
- Deletion: This simply uses a null value or deletes the data. These are the primary methods of masking data:
- Static: In static masking, a new copy of the data is created with the masked values. Static masking is typically efficient when creating clean nonproduction environments.
- Dynamic: Dynamic masking, sometimes referred to as on-the-fly masking, adds a layer of masking between the application and the database.
The masking layer is responsible for masking the information in the database on the fly when the presentation layer accesses it. This type of masking is efficient when protecting production environments.
It can hide the full credit card number from customer service representatives, but the data remains available for processing.
Direct identifiers and indirect identifiers form two primary components for the identification of individuals, users, or indeed personal information.
Direct identifiers are fields that uniquely identify the subject (usually name, address, and so on) and are usually referred to as PII. Masking solutions are typically used to protect direct identifiers.
Indirect identifiers typically consist of demographic or socioeconomic information, dates, or events.
Although each standalone indirect identifier cannot identify the individual, the risk is that combining several indirect identifiers with external data can result in exposing the subject of the information.
For example, imagine a scenario in which users were able to combine search engine data, coupled with online streaming recommendations to tie back posts and recommendations to individual users on a website.
Anonymization is the process of removing the indirect identifiers to prevent data analysis tools or other intelligent mechanisms from collating or pulling data from multiple sources to identify individual or sensitive information.
The process of anonymization is similar to masking and includes identifying the relevant information to anonymize and choosing a relevant method for obscuring the data.
The challenge with indirect identifiers is the ability for this type of data to be integrated into free text fields that tend to be less structured than direct identifiers, thus complicating the process.
Data Masking and Tokenization
Tokenization is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token.
The token is usually a collection of random values with the shape and form of the original data placeholder and mapped back to the original data by the tokenization application or solution.
Tokenization is not encryption and presents different challenges and different benefits.
Encryption is using a key to obfuscate data, while tokenization removes the data entirely from the database, replacing it with a mechanism to identify and access the resources.
Tokenization is used to safeguard sensitive data in a secure, protected, or regulated environment.
Tokenization can be implemented internally where there is a need to secure sensitive data centrally or externally using a tokenization service.
Tokenization can assist with each of these
- Complying with regulations or laws
- Reducing the cost of compliance
- Mitigating risks of storing sensitive data and reducing attack vectors on that data
- Keep the following tokenization and cloud considerations in mind:
- When using tokenization as a service, it is imperative to ensure the provider’s and solution’s ability to protect your data. Note that you cannot outsource accountability.
- When using tokenization as a service, special attention should be paid to the process of authenticating the application when storing or retrieving sensitive data.
- Where external tokenization is used, appropriate encryption of communications should be applied to data in motion.
- As always, evaluate your compliance requirements before considering a cloud-based tokenization solution. You need to weigh the risks of having to interact with different jurisdictions and different compliance requirements.