Internal code exploration with sample dataset, examples, and more
Introduction
- Spring Data JPA provides handy features to interact with databases for all kinds of operations.
- One of the much-used operations is the deletion of the records in the database. But deleting large numbers of records one by one is not resource efficient, hence doing it them in batches makes much sense.
- Spring Data JPA provides helper methods that allow us to delete records in batches. In this article, we will deep dive into them, we will explore the internal code of these methods, and write logic along with the execution.
For the best java and spring-related content please subscribe to the newsletter.
Sample Table/Model
- First of all, let’s create a sample table Campaign with some mock data.
CREATE TABLE Campaign (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
budget DECIMAL(10, 2) NOT NULL
);
postgres=# \d campaign
Table "public.campaign"
Column | Type | Collation | Nullable | Default
------------+------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('campaign_id_seq'::regclass)
name | character varying(255) | | not null |
start_date | date | | not null |
end_date | date | | not null |
budget | numeric(10,2) | | not null |
Indexes:
"campaign_pkey" PRIMARY KEY, btree (id)
Inserting Sample Data
- Now let’s insert 1M records to the Campaign table. The below script (plpgsql) does the job.
-- Create a function to generate and insert mock data
CREATE OR REPLACE FUNCTION generate_mock_data()
RETURNS VOID AS $$
DECLARE
counter INT := 1;
BEGIN
WHILE counter <= 1000000 LOOP
INSERT INTO Campaign (name, start_date, end_date, budget)
VALUES (
'Campaign ' || counter,
CURRENT_DATE + (counter * INTERVAL '1 day'),
CURRENT_DATE + ((counter + 30) * INTERVAL '1 day'),
ROUND(RANDOM()::numeric * 10000 + 1000, 2)
);
counter := counter + 1;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Execute the function to generate and insert 1 million rows
SELECT generate_mock_data();
- Now our table contains 1M records that we can delete from Spring Data JPA.
postgres=# select count(*) from campaign;
count
---------
1000000
(1 row)
JpaRepository
- JpaRepository is one of the common repositories that we extend when we define our repository interface for managing our entities.
- JpaRepository provides 3 helper methods that can delete records in batches:
1. deleteAllInBatch()
2. deleteAllInBatch(Iterable<T> entities)
3. deleteAllByIdInBatch(Iterable<ID> ids) - Since JpaRepository is an interface, its implementation exists in the SimpleJpaRepository class.
Now lets deep dive into these methods and write some code along with execution.
deleteAllInBatch()
- This method deletes all the records in the database in one batch.
- If we look at the internal implementation of this method in the SimpleJpaRepository.java class, we see that all it’s doing is getting a delete query, adding query hints if it’s passed, and then applying executedUpdate().
@Transactional
public void deleteAllInBatch() {
Query query = this.entityManager.createQuery(this.getDeleteAllQueryString());
this.applyQueryHints(query);
query.executeUpdate();
}
private String getDeleteAllQueryString() {
return QueryUtils.getQueryString("delete from %s x",
this.entityInformation.getEntityName());
}
- If we use this method in our logic class, we just need to extend JpaRepository to campaignRepository and we will get access to deleteAllInBatch() since this method is an interface method and implemented by SimpleJpaRepository.java.
public interface CampaignRepository extends JpaRepository<Campaign, Long> { }
public void deleteAllCampaigns(){
System.out.println("deleting campaigns");
long l = System.currentTimeMillis();
campaignRepository.deleteAllInBatch();
System.out.println("exec time(ms) : "+(System.currentTimeMillis()-l));
}
- Below are the generated logs for our logic execution. As we can see the generated query is
“delete from campaign” .
Executing SQL Query: delete from campaign
Hibernate: delete from campaign
exec time(ms) : 1819
23:56:43.529 INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
7791152 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
41620 nanoseconds spent preparing 1 JDBC statements;
1806712607 nanoseconds spent executing 1 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
4662 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
}
- One thing we should keep in mind is that this method removes all the records from the database in one batch, so if we have a large table it might take too much of the time to finish the operation. In that case, the truncate operation may be faster to implement.
- This method is not useful if we want to delete certain batches of records but not all.
If we want to delete certain batches of records, then we can use deleteAllInBatch(Iterable<T> entities) and deleteAllByIdInBatch(Iterable<ID> ids) methods.
deleteAllInBatch(Iterable<T> entities)
- deleteAllInBatch(Iterable<T> entities) takes a collection of entities that we want to delete and builds a delete query for that many records.
- If we see the internal code of the method, we will see the base query is “delete from %s x” but the applyAndBind method builds another part of the query that includes the target ids of the query.
@Transactional
public void deleteAllInBatch(Iterable<T> entities) {
Assert.notNull(entities, "Entities must not be null");
if (entities.iterator().hasNext()) {
QueryUtils.applyAndBind(
QueryUtils.getQueryString(
"delete from %s x",
this.entityInformation.getEntityName()),
entities,
this.entityManager
).executeUpdate();
}
}
public static <T> Query applyAndBind(String queryString, Iterable<T> entities, EntityManager entityManager) {
Assert.notNull(queryString, "Querystring must not be null");
Assert.notNull(entities, "Iterable of entities must not be null");
Assert.notNull(entityManager, "EntityManager must not be null");
Iterator<T> iterator = entities.iterator();
if (!iterator.hasNext()) {
return entityManager.createQuery(queryString);
} else {
String alias = detectAlias(queryString);
StringBuilder builder = new StringBuilder(queryString);
builder.append(" where");
int i = 0;
// building other part of the query with target ids
while(iterator.hasNext()) {
iterator.next();
Object[] var10002 = new Object[]{alias, null};
++i;
var10002[1] = i;
builder.append(String.format(" %s = ?%d", var10002))
if (iterator.hasNext()) {
builder.append(" or");
}
}
Query query = entityManager.createQuery(builder.toString());
iterator = entities.iterator();
i = 0;
while(iterator.hasNext()) {
++i;
query.setParameter(i, iterator.next());
}
return query;
}
}
Now we can write a simple method to delete a batch of entities. In the below example, we first query certain campaigns and then pass them to deleteAllInBatch(campaign) to delete those many campaigns.
public void deleteAllCampaignsInDefinedBatch(){
LocalDate currentDate = LocalDate.now();
List<Campaign> campaigns = campaignRepository.findByStartDateBetween(currentDate.plusMonths(1), currentDate.plusDays(10));
System.out.println(campaigns.size());
long l = System.currentTimeMillis();
campaignRepository.deleteAllInBatch(campaigns);
System.out.println("millisecond: "+(System.currentTimeMillis()-l));
}
- Generated logs for the above method are listed below. As we can see deleteAllInBatch(campaign) generates a query following query
“delete from campaign where id=? or id=? or id=?” - Here is ids are the ids of the campaign entities that we passed.
Executing SQL Query: delete from campaign where id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=? or id=?
millisecond: 53
00:16:01.733 INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
6932689 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
9986550 nanoseconds spent preparing 2 JDBC statements;
78803811 nanoseconds spent executing 2 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
1400175 nanoseconds spent executing 1 flushes (flushing a total of 30 entities and 0 collections);
7571121 nanoseconds spent executing 1 partial-flushes (flushing a total of 30 entities and 30 collections)
}
- deleteAllInBatch(Iterable<T> entities) is a useful method when we want to delete certain batches of records. If we have to delete let’s say 500k records then we can certainly build sub batches and loop over them and delete 10k records or 100k records at a time so that we don’t overwhelm our database.
deleteAllByIdInBatch(Iterable<ID> ids)
- Like deleteAllInBatch(Iterable<T> entities), we can also use deleteAllByIdInBatch(Iterable<ID> ids) which takes a collection of IDs instead of entities but does the same job.
- This method internally calls to deleteAllInBatch(Iterable<T> entities). Below is the internal code implementation for deleteAllByIdInBatch(Iterable<ID> ids).
@Transactional
public void deleteAllByIdInBatch(Iterable<ID> ids) {
Assert.notNull(ids, "Ids must not be null");
if (ids.iterator().hasNext()) {
if (this.entityInformation.hasCompositeId()) {
List<T> entities = new ArrayList();
ids.forEach((id) -> {
entities.add(this.getReferenceById(id));
});
// calls to deleteAllInBatch(entities)
this.deleteAllInBatch(entities);
} else {
String queryString = String.format("delete from %s x where %s in :ids", this.entityInformation.getEntityName(), this.entityInformation.getIdAttribute().getName());
Query query = this.entityManager.createQuery(queryString);
if (Collection.class.isInstance(ids)) {
query.setParameter("ids", ids);
} else {
Collection<ID> idsCollection = (Collection)StreamSupport.stream(ids.spliterator(), false).collect(Collectors.toCollection(ArrayList::new));
query.setParameter("ids", idsCollection);
}
this.applyQueryHints(query);
query.executeUpdate();
}
}
}
- Our simple logic queries the database for a range of campaigns gets the IDs and targets for deletion by passing them to the deleteAllByIdInBatch(ids).
public void deleteAllCampaignsByIds(){
LocalDate currentDate = LocalDate.now();
currentDate = currentDate.plusMonths(1);
List<Campaign> campaigns = campaignRepository.findByStartDateBetween(currentDate, currentDate.plusDays(10));
System.out.println(campaigns.size());
List<Long> ids = campaigns.stream().map(Campaign::getId).toList();
System.out.println("ids: "+ids);
campaignRepository.deleteAllByIdInBatch(ids);
}
- Generated logs show that the below query was executed.
“delete from campaign where id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)”
Executing SQL Query: select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?
[select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?]
Hibernate: select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?
11
ids: [23000102, 23000103, 23000104, 23000105, 23000106, 23000107, 23000108, 23000109, 23000110, 23000111, 23000112]
Executing SQL Query: delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)
[select c1_0.id,c1_0.budget,c1_0.end_date,c1_0.name,c1_0.start_date from campaign c1_0 where c1_0.start_date between ? and ?, delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)]
Hibernate: delete from campaign where id in (?,?,?,?,?,?,?,?,?,?,?)
00:06:45.019 INFO o.h.e.i.StatisticalLoggingSessionEventListener - Session Metrics {
11824981 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
22250505 nanoseconds spent preparing 2 JDBC statements;
189448302 nanoseconds spent executing 2 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
1401303 nanoseconds spent executing 1 flushes (flushing a total of 11 entities and 0 collections);
8131438 nanoseconds spent executing 1 partial-flushes (flushing a total of 11 entities and 11 collections)
}
- If we have to delete large batch-size records then we can always paginate them and delete them in sub-batches instead of sending let’s say 500k records in one query.
Conclusion
- We should always keep in mind that operating over batches instead of doing all the work in one go makes sense for databases.
- Spring Data JPA provides multiple helper methods that can delete records from the database. Depending on the use case we can choose one or the other.
- Apart from using these helper methods, we can always write our own custom queries since Spring Data JPA allows that too.
Become Better Dev:
To upgrade your developer skills checkout below resources:
- [NEW] Master Spring Boot 3 & Spring Framework 6 with Java
- Upskill SQL by Practicing it ( 20 % off with code: ABNEW20OFF )
- Get Ready For Spring Boot Interview. | Java Interview.