Introduction
- Protecting sensitive information is one of the important steps especially when we perform Business Intelligence activity , any operator trying to access logs that might contain PII data , folks in support get hands on customer data.
- In order to comply with so many compliance out there , we need to setup process that takes input data and make sure all the PII data is anonymized, pseudonymized or generalized.
- In this blog, we will see have we can use java to achieve pseudonymization through hashing.
Use Case
- Lets say we have below information as List<List<String>> that we would like to de-identify.
- We have many option that we can perform on these records, such as Masking , Bucketing, Hashing(Pseudonymization) etc.
customer_name, age, customer_id, account_no, account_balance
john doe, 32, 1234, 8890, $12400
- Our goal is to de-identify account number using hashing strategy. Hashing will replace / tokenize our actual value with another value.
Hashing Logic
- If you don’t know, Hashing is basically tokenizing process where input is transformed to some output called token which is irreversible( FYI, there are some hashing techniques that are reversible).
- We will use HMAC( Hashed based message authentication code ) based hashing.
- We will use SHA-256 to hash our input and then we will pass secret as salt that would protect our hashed data against rainbow attack
- We can use bot type salt , dynamic or static. But dynamic is recommended since its hard for anyone to pre compute it and cache it.
Java Implementation
- Java provides crypto class that we can use to hash our input records. It’s part of Javax package.
- We can Mac Instance of type HmacSHA256 initialize it with secretKey that would work as our salt for the input.
- We are using here “secretkey” as static salt but in real situation we should always use dynamic salt, that maybe some UUID or something.
- Once our Mac instance is initialized , now we can convert input text into bytes array and perform hash operation.
// Hashing
public static void HashMyText(String text) throws NoSuchAlgorithmException, InvalidKeyException {
Mac mac = Mac.getInstance("HmacSHA256");
SecretKeySpec secretKeySpec = new SecretKeySpec("secretkey".getBytes(), "HmacSHA256");
mac.init(secretKeySpec);
byte[] bytes = mac.doFinal(text.getBytes());
System.out.println(text+" ->" +convertToHex(Base64.getEncoder().encode(bytes)));
}
- In order to convert encoded string to hex string we can do it with below logic.
public static String convertToHex(byte[] digest) {
StringBuilder builder = new StringBuilder();
for (byte b:digest){
builder.append(String.format("%02x",b));
}
return builder.toString();
}
Client
- Now lets use our hash logic to hash customer account number.
- We have list of account number here that we would like to hash.
- Once we execute the client code , we can see hashed account number in the output.
public static void main(String[] args) throws NoSuchAlgorithmException, InvalidKeyException {
List<String> input = Arrays.asList("889076543", "989076543", "389076543", "589076543");
for(String accountNo: input) {
hashMyText(accountNo);
}
System.out.println("==================");
for(String accountNo: input) {
hashMyText(accountNo);
}
}
Test
- Our output shows account number along with hashed account number .
- Since our salt is static , if we do hash operation again, our output is same.
- In case of dynamic salt , our output would be different each time we perform hashing.
Conclusion
- In this blog, we used javax.crypto utility to hash our PII data of customer
- We can extend this logic to build methods to mask n characters from credit card number or convert customer age into some bucket.
Bonus Tip
- If you want to upskill your Java, you should definitely check out this bestseller course