Introduction
- Duplicates entry is very common problem when we exchange data from one machine to another machine.
- As a client , when consuming these records we have to implement a logic to handle these duplicate records.
- In this blog we will see how can we handle duplicates in Java.
Use Case
- We will take example of list of customers and emailId that we have been provided with . And our major goal is to filter out distinct records from it.
// list of emailId
List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");
// list of emailId
List<Customer> customers = List.of(new Customer(1, 23,"abc@gmail.com"),new Customer(2, 25,"xbc@gmail.com"),new Customer(1, 23,"abc@gmail.com"));
Using HashSet
- At first we will use Hashset data structure. If you don’t know , hashset only contains unique element , if you try to add same element twice, hashset will return false.
- So we are guaranteed to have unique elements in hashset.
- In Below example , we are iterating each emailId and adding to our set.
Once it is finished , we create a distinct list from this set .
List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");
emailList.stream().forEach(a-> System.out.println(a));
System.out.println("==========================");
// using hashset
HashSet<String> set = new HashSet<>();
for ( String emailId: emailList ) {
set.add(emailId);
}
emailList = new ArrayList<>(set);
emailList.stream().forEach(a-> System.out.println(a));
- Now we have unique list of emailId. we can check by printing it.
- As we can see , our duplicate emailId has been filtered out from the original email list.
Using Distinct method
- Other method we can use is in-build distinct() method from Streams API.
- Let’s create Stream from list and perform distinct() operation to filter out only unique elements.
List<String> emailList = List.of("abc@gmail.com", "pqr@gmail.com", "tuf@gmail.com", "lth@gmail.com", "abc@gmail.com");
emailList.stream().forEach(a-> System.out.println(a));
System.out.println("==========================");
// streams api
List<String> distinctEmailList = emailList.stream().distinct().collect(Collectors.toList());
distinctEmailList.stream().forEach(a-> System.out.println(a));
- Now we can test by printing our distinct email list.
- So far we have been filtering Primitive type , but next we will use Custom Object type and we will remove duplicate objects.
Customer POJO
- Lets first create custom class Customer which is nothing but POJO.
import java.util.Objects;
public class Customer {
private int id;
private int age;
private String emailId;
public Customer(int id, int age, String emailId) {
this.id = id;
this.age = age;
this.emailId = emailId;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public String getEmailId() {
return emailId;
}
public void setEmailId(String emailId) {
this.emailId = emailId;
}
@Override
public String toString() {
return "Customer{" +
"id=" + id +
", age=" + age +
", emailId='" + emailId + '\'' +
'}';
}
}
- Now let’s create sample customer list by creating customer objects.
- Now lest use streams api distinct() method and check if its working on the object as well .
List<Customer> customers = List.of(new Customer(1, 23,"abc@gmail.com"),
new Customer(2, 25,"xbc@gmail.com"),
new Customer(1, 23,"abc@gmail.com"));
customers.stream().forEach(a-> System.out.println(a));
System.out.println("==========================");
customers.stream().distinct().forEach(a-> System.out.println(a));
- But when we print , we found that it is not true. It still prints duplicate customers.
- That means distinct is not working on custom type ?
- The main reason for this is not the issue with distinct() method but the hashcode and equals method.
- We have to override equals method and hashcode method in our Customer POJO that we created earlier.
- As we can see we are comparing each property of customer class and in hascode we are using customer class properies to calculate hashcode .
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Customer customer = (Customer) o;
return id == customer.id && age == customer.age && emailId.equals(customer.emailId);
}
@Override
public int hashCode() {
return Objects.hash(id, age, emailId);
}
- Once we have added hashCode and equals method then our distinct method on streams will work.
- We can see the result that filters out duplicated values.
HashSet
- We can also use HashSet that we used for primitive filtering.
HashSet<Customer> set = new HashSet<>();
customers.stream().forEach(a->set.add(a));
ArrayList<Customer> distinctCustomers = new ArrayList<>(set);
distinctCustomers.stream().forEach(a-> System.out.println(a));
- Here is the result.
Conclusion
- In this blog we have see various approaches in java to filter out unique elements from list of elements.
- We discussed use of Hashset and streams api to achieve it.