How to Generate Sample Data using Regex In Java

  • Post last modified:May 3, 2023
  • Reading time:4 mins read

Using Generex to Generate Sample Data using Regex

Introduction

  • Having test data available is a common requirement in most projects.
  • However, it’s always tedious to get realistic test data that is based on production and often we end up creating mock data that suffice the development needs.
  • In the previous article, we discussed one approach to generating mock data.
  • The goal of this article is to use another approach to generate mock data using regex.

Pre-requisite

  • We will need generex dependency that will parse our regex and help us generate data from passed regex.
  • Make sure to add it in the pom.xml file
<dependency>
      <groupId>com.github.mifmif</groupId>
      <artifactId>generex</artifactId>
      <version>1.0.2</version>
</dependency>

How does it work?

  • We initialize the generex instance by passing our regex to the constructor.
  • Once we have generex instance, we can use helper methods such as random(), getFirstMatch(), and getAllMatchedStrings() to generate a record.
Generex generex= new Generex("[REGEX_HERE]");
generex.random(); // generate data

Fields

  • username
  • age
  • zipcode
  • phoneNumber
  • cardNumber

Generate username

  • In order to generate a username field, we will use regex 
    “[a-zA-Z0–9]{18}” 
Generex username = new Generex("[a-zA-Z0-9]{18}");
System.out.println(username.random()); 
// output : 991Pod3a3ZyB87c6ni
  • We can also mix names with regex as shown below for usernames to have names+alphanumeric sequence.
List<String> names = List.of("sam", "jam", "tam");
String namesString = names.stream().map(n -> n + "_").collect(Collectors.joining("|"));
Generex username1 = new Generex("("+namesString+")([a-z0-9]{5})");
System.out.println(username1.random()); // jam_b64z5

Generate age

  • For age, we are using “(1[89]|[2–9]\\d)” that will generate age between 18–99
Generex age = new Generex("(1[89]|[2-9]\\d)");
System.out.println(age.random()); // 20

Generate ZipCode

  • We are using regex for the US zip code.
Generex zipCode = new Generex("\\d{5}(-)\\d{4}");
System.out.println(zipCode.random()); 
//84042-2198

Generate Phone Number

  • Using regex for US phone numbers.
Generex phoneNumber = new Generex("([0-9]{3})-([0-9]{3})-([0-9]{4})");
System.out.println(phoneNumber.random());
// 658-101-3783

Generate Card Number

  • Generating card numbers considering 16 digit pattern, which is not strictly true for all the cards so maybe we can improve it, but for demo purposes, it’s fine I guess.
Generex cardNumber = new Generex("\\d{4}(-)\\d{4}(-)\\d{4}(-)\\d{4}");
System.out.println(cardNumber.random());
// 4737-4046-5951-7119

Helper method to generate records

  • Now that we have regex ready let’s put them all together and create the method generateRecords() that will be called to create mock records.
public static List<String> generateRecord(){
     Generex username = new Generex("[a-zA-Z0-9]{18}");
     List<String> names = List.of("sam", "jam", "tam");
     String namesString = names.stream().map(n -> n + "_").collect(Collectors.joining("|"));
     Generex username1 = new Generex("("+namesString+")([a-z0-9]{5})");
     Generex cardNumber = new Generex("\\d{4}(-)\\d{4}(-)\\d{4}(-)\\d{4}");
     Generex age = new Generex("(1[89]|[2-9]\\d)");
     Generex zipCode = new Generex("\\d{5}(-)\\d{4}");
     Generex phoneNumber = new Generex("([0-9]{3})-([0-9]{3})-([0-9]{4})");
     
     List<String> fields = new ArrayList<>();
     fields.add(username1.random());
     fields.add(age.random());
     fields.add(zipCode.random());
     fields.add(phoneNumber.random());
     fields.add(cardNumber.random());
     return fields;
    }

Generating records in bulk

  • Now we can use IntStream from Java 8 streams to generate the range that we need and invoke generateRecord() method which returns a list.
  • We then map that list of fields to string with a comma to return as CSV, but we can have any mapper map to any format like JSON and return the result.
  • Here we are also printing to the console to verify the output.
    public static void main(String[] args) {
        List<String> records = IntStream.range(1, 100)
                .mapToObj(i -> generateRecord())
                .map(a -> String.join(",", a))
                .peek(System.out::println)
                .toList();
       // sink the records
    }

Output

  • Our output seems to generate the records that we wanted for testing.

Conclusion

  • In this article, we learned how we can use regex to quickly generate test/mock data.
  • generex provides good support to pass regex and helper methods to generate random records.

Before You Leave

Leave a Reply