How to filter distinct elements in java

  • Post last modified:August 15, 2022
  • Reading time:5 mins read

Introduction

  • Duplicates entry is very common problem when we exchange data from one machine to another machine.
  • As a client , when consuming these records we have to implement a logic to handle these duplicate records.
  • In this blog we will see how can we handle duplicates in Java.

Use Case

  • We will take example of list of customers and emailId that we have been provided with . And our major goal is to filter out distinct records from it.
// list of emailId
List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");

// list of emailId
List<Customer> customers = List.of(new Customer(1, 23,"abc@gmail.com"),new Customer(2, 25,"xbc@gmail.com"),new Customer(1, 23,"abc@gmail.com"));

Using HashSet

  • At first we will use Hashset data structure. If you don’t know , hashset only contains unique element , if you try to add same element twice, hashset will return false.
  • So we are guaranteed to have unique elements in hashset.
  • In Below example , we are iterating each emailId and adding to our set.
    Once it is finished , we create a distinct list from this set .
       List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");
        emailList.stream().forEach(a-> System.out.println(a));

        System.out.println("==========================");

        // using hashset
        HashSet<String> set = new HashSet<>();
        for ( String emailId: emailList ) {
            set.add(emailId);
        }
        emailList = new ArrayList<>(set);
        emailList.stream().forEach(a-> System.out.println(a));
  • Now we have unique list of emailId. we can check by printing it.
  • As we can see , our duplicate emailId has been filtered out from the original email list.

Using Distinct method

  • Other method we can use is in-build distinct() method from Streams API.
  • Let’s create Stream from list and perform distinct() operation to filter out only unique elements.
List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");
        emailList.stream().forEach(a-> System.out.println(a));

        System.out.println("==========================");
        
        // streams api
        List<String> distinctEmailList = emailList.stream().distinct().collect(Collectors.toList());
        distinctEmailList.stream().forEach(a-> System.out.println(a));
  • Now we can test by printing our distinct email list.
  • So far we have been filtering Primitive type , but next we will use Custom Object type and we will remove duplicate objects.

Customer POJO

  • Lets first create custom class Customer which is nothing but POJO.
import java.util.Objects;

public class Customer {

    private int id;
    private int age;
    private String emailId;

    public Customer(int id, int age, String emailId) {
        this.id = id;
        this.age = age;
        this.emailId = emailId;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public String getEmailId() {
        return emailId;
    }

    public void setEmailId(String emailId) {
        this.emailId = emailId;
    }

    @Override
    public String toString() {
        return "Customer{" +
                "id=" + id +
                ", age=" + age +
                ", emailId='" + emailId + '\'' +
                '}';
    }
  }
  • Now let’s create sample customer list by creating customer objects.
  • Now lest use streams api distinct() method and check if its working on the object as well .
List<Customer> customers = List.of(new Customer(1, 23,"abc@gmail.com"),
                new Customer(2, 25,"xbc@gmail.com"),
                new Customer(1, 23,"abc@gmail.com"));

        customers.stream().forEach(a-> System.out.println(a));

        System.out.println("==========================");

        customers.stream().distinct().forEach(a-> System.out.println(a));
  • But when we print , we found that it is not true. It still prints duplicate customers.
  • That means distinct is not working on custom type ? 
  • The main reason for this is not the issue with distinct() method but the hashcode and equals method. 
  • We have to override equals method and hashcode method in our Customer POJO that we created earlier.
  • As we can see we are comparing each property of customer class and in hascode we are using customer class properies to calculate hashcode .
@Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Customer customer = (Customer) o;
        return id == customer.id && age == customer.age && emailId.equals(customer.emailId);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, age, emailId);
    }
  • Once we have added hashCode and equals method then our distinct method on streams will work.
  • We can see the result that filters out duplicated values.

HashSet

  • We can also use HashSet that we used for primitive filtering.
        HashSet<Customer> set = new HashSet<>();
        customers.stream().forEach(a->set.add(a));

        ArrayList<Customer> distinctCustomers = new ArrayList<>(set);
        distinctCustomers.stream().forEach(a-> System.out.println(a));
  • Here is the result.

Conclusion

  • In this blog we have see various approaches in java to filter out unique elements from list of elements.
  • We discussed use of Hashset and streams api to achieve it.

Leave a Reply