How to Protect PII With Java

  • Post last modified:December 15, 2022
  • Reading time:3 mins read

Introduction

  • Protecting sensitive information is one of the important steps especially when we perform Business Intelligence activity , any operator trying to access logs that might contain PII data , folks in support get hands on customer data.
  • In order to comply with so many compliance out there , we need to setup process that takes input data and make sure all the PII data is anonymized, pseudonymized or generalized.
  • In this blog, we will see have we can use java to achieve pseudonymization through hashing.

Use Case

  • Lets say we have below information as List<List<String>> that we would like to de-identify.
  • We have many option that we can perform on these records, such as Masking , Bucketing, Hashing(Pseudonymization) etc.
customer_name, age, customer_id, account_no, account_balance
john doe, 32,  1234, 8890, $12400
  • Our goal is to de-identify account number using hashing strategy. Hashing will replace / tokenize our actual value with another value.

Hashing Logic 

  • If you don’t know, Hashing is basically tokenizing process where input is transformed to some output called token which is irreversible( FYI, there are some hashing techniques that are reversible).
  • We will use HMAC( Hashed based message authentication code ) based hashing.
  • We will use SHA-256 to hash our input and then we will pass secret as salt that would protect our hashed data against rainbow attack
  • We can use bot type salt , dynamic or static. But dynamic is recommended since its hard for anyone to pre compute it and cache it.

Java Implementation

  • Java provides crypto class that we can use to hash our input records. It’s part of Javax package.
  • We can Mac Instance of type HmacSHA256 initialize it with secretKey that would work as our salt for the input.
  • We are using here “secretkey” as static salt but in real situation we should always use dynamic salt, that maybe some UUID or something.
  • Once our Mac instance is initialized , now we can convert input text into bytes array and perform hash operation.
    // Hashing
    public static void HashMyText(String text) throws NoSuchAlgorithmException, InvalidKeyException {
        Mac mac = Mac.getInstance("HmacSHA256");
        SecretKeySpec secretKeySpec = new SecretKeySpec("secretkey".getBytes(), "HmacSHA256");
        mac.init(secretKeySpec);
        byte[] bytes = mac.doFinal(text.getBytes());
        System.out.println(text+" ->" +convertToHex(Base64.getEncoder().encode(bytes)));
    }
  • In order to convert encoded string to hex string we can do it with below logic.
public static String convertToHex(byte[] digest) {
        StringBuilder builder = new StringBuilder();
        for (byte b:digest){
            builder.append(String.format("%02x",b));
        }
        return builder.toString();
}

Client

  • Now lets use our hash logic to hash customer account number.
  • We have list of account number here that we would like to hash.
  • Once we execute the client code , we can see hashed account number in the output.
public static void main(String[] args) throws NoSuchAlgorithmException, InvalidKeyException {
        List<String> input = Arrays.asList("889076543", "989076543", "389076543", "589076543");

        for(String accountNo: input) {
             hashMyText(accountNo);
        }

        System.out.println("==================");

        for(String accountNo: input) {
            hashMyText(accountNo);
        }
}

Test

  • Our output shows account number along with hashed account number .
  • Since our salt is static , if we do hash operation again, our output is same.
  • In case of dynamic salt , our output would be different each time we perform hashing.

Conclusion

  • In this blog, we used javax.crypto utility to hash our PII data of customer 
  • We can extend this logic to build methods to mask n characters from credit card number or convert customer age into some bucket.

Bonus Tip

Leave a Reply