Spring Data JPA: How to bulk insert data

  • Post last modified:March 21, 2024
  • Reading time:5 mins read

Discussing 3 Options to bulk insert data in RDBMS

Introduction

  • As a Software engineer who works primarily as a backend engineer or full-stack engineer, we often come across use cases where inserting data as bulk into the database is paramount.
  • Some of the use cases will be like backfilling newly modified tables with old table data, populating secondary database tables with the primary database, or even taking back up of actual data, etc.
  • In this article, we will discuss how we can insert data in bulk into the database using Spring Data JPA.

Entity

  • For this example, we are using the Accounts table as shown below. This table gets a unique primary key Id from the accounts_seq generator.
@Entity
@Table(name="ACCOUNTS")
public class Account {
    @Id @GeneratedValue(strategy= GenerationType.SEQUENCE, generator = "accounts_seq")
    @SequenceGenerator(name = "accounts_seq", sequenceName = "accounts_seq", allocationSize = 1)
    @Column(name = "user_id")
    private int userId;
    private String username;
    private String password;
    private String email;
    private Timestamp createdOn;
    private Timestamp lastLogin;

// getters , setters
  • The repository interface extends from JpaRepository which provides many out-of-the-box methods to deal with the database easily.
public interface AccountRepository extends JpaRepository<Account, Integer> 

Option 1

  • JpaRepository extends from ListCrudRepository which provides saveAll method.
public interface ListCrudRepository<T, ID> extends CrudRepository<T, ID> {
    <S extends T> List<S> saveAll(Iterable<S> entities);
  • In this option, we will save a list of accounts using this method that will persist in the list of accounts in the accounts table.
public boolean addAccounts(List<Account> accounts){
     List<Account> persistedAccounts = repository.saveAll(accounts);
     return persistedAccounts.size() == accounts.size();
}
  • Our controller receives a list of accounts as @RequestBody.
@PostMapping("/add")
public boolean addAccounts(@RequestBody List<Account> accounts) throws SQLException {
   accounts.stream().forEach(a -> {
       a.setCreatedOn(Timestamp.from(Instant.now()));
       a.setLastLogin(Timestamp.from(Instant.now()));
    });
   return notificationLogic.addAccounts(accounts);
}
  • We can send an array of accounts as JSON in the request body.
  • Once we hit the endpoint, our insert logic gets executed, I have enabled JPA logs to see the query.
  • Our database is updated with recent insertion of records.
  • As we can see in the query logs saveAll executes one query for each record and does not insert everything in one query using INSERT INTO VALUES.

Option 2

  • In this option, we will take the Session object in our hand, so that we can get the Connection instance.
    public boolean addAccounts2(List<Account> accounts) throws SQLException {
        Session session = entityManager.unwrap(Session.class);

        int insertedRecords = session.doReturningWork(new ReturningWork<>() {
                           @Override
                           public Integer execute(Connection connection) throws SQLException {
                               int x = executeBatch(connection, accounts).length;
                               connection.commit();
                               return x;
                           }
                       }
        );
        return insertedRecords == accounts.size();
    }
  • Once we have a hands-on Connection instance we can use prepareStatement to add each record as a batch to our insert query.
  • But keep in mind we are not using Spring Data JPA functionality here, instead, we are using a RAW connection to connect and execute the query.
 private int[] executeBatch(Connection connection,List<Account> accounts) throws SQLException {
        PreparedStatement preparedStatement = connection.prepareStatement("insert into accounts (username, password, email, created_on, last_login) values(?, ?,?,?,?)");
        connection.setAutoCommit(false);
        accounts.forEach(a->
                {
                    try {
                        setFields(preparedStatement, a);
                        preparedStatement.addBatch();
                    } catch (SQLException e) {
                        throw new RuntimeException(e);
                    }
                }
        );
        return preparedStatement.executeBatch();
    }

    private void setFields(PreparedStatement preparedStatement, Account a) throws SQLException {
        preparedStatement.setString(1,a.getUsername());
        preparedStatement.setString(2,a.getPassword());
        preparedStatement.setString(3,a.getEmail());
        preparedStatement.setTimestamp(4,a.getCreatedOn());
        preparedStatement.setTimestamp(5,a.getLastLogin());
    }
}

Postman Request

  • records are inserted into the database as bulk inserts. 

Option 3

  • In this option, we build our INSERT INTO VALUES query string by ourselves.
  • We will need a hold on EntityManager which we can inject into the service class.
    @PersistenceContext
    private EntityManager entityManager;
  • Once we have the query built we can use entityManager.createNativeQuery and pass our built query as an argument.
  • Once the query instance is built we can perform executeUpdate() that will hit this query on the database and bulk insert the data.
 @PersistenceContext
    private EntityManager entityManager;    
    
    @Transactional
    public boolean addAccounts3(List<Account> accounts) throws SQLException {
        String base_query = "insert into accounts (username, password, email, created_on, last_login) values ";
        String values = accounts.stream().map(a -> "( " + "'" +a.getUsername()+ "'" + "," + "'" + a.getPassword()+ "'" + ","+ "'" + a.getEmail()+ "'" + ","+ "'" + a.getCreatedOn()+ "'" + ","+ "'" + a.getLastLogin()+ "'" + ")").collect(Collectors.joining(","));
        System.out.println(base_query+values);
        Query query = entityManager.createNativeQuery(base_query + values);
        int x = query.executeUpdate();
        System.out.println(x);
        return true;
    }

Postman Request

  • records have persisted in the database.
  • below is the log of the executed query.

Note:

One best practice here is that we should avoid inserting too many records all at once and insert using batch size like 10k records and add sleep of 1 sec in between each insert so that we don’t hold connection too long, donot send too many records on wire and give the database breathing room while iterating bulk insert iteratively.

Conclusion

  • In this article, we have discussed 3 options to bulk insert data in the Spring boot app.
  • We also discussed that we can create a sub batch from a big batch to insert data into the database iteratively.

Before You Leave

Leave a Reply