How to Bulk Load Data With Copy Command (Using JDBC and Postgres)

  • Post last modified:August 6, 2023
  • Reading time:4 mins read

Using JDBC and Postgres Copy API to bulk load data into the Postgres database

Introduction

  • COPY command copies data between file and table, while COPY TO copies data from table to file, and COPY FROM does the opposite.
  • Copy command is very efficient when we want to insert bulk data into the table.
  • We have covered bulk insert data in previous article, in this article we cover bulk insert using the COPY command.

Table

  • We have a book_reviews table that records the review from different users for different books.
postgres=# \d book_reviews
                                   Table "public.book_reviews"
   Column    |  Type   | Collation | Nullable |                     Default
-------------+---------+-----------+----------+--------------------------------------------------
 reviews_id  | integer |           | not null | nextval('book_reviews_reviews_id_seq'::regclass)
 user_id     | text    |           |          |
 isbn        | text    |           |          |
 book_rating | text    |           |          |
Indexes:
    "book_reviews_pkey" PRIMARY KEY, btree (reviews_id)
  • Total records in book_reviews table.
postgres=# select count(*) from book_reviews;
  count
---------
 2469085
(1 row)

Business Logic

Building Connection Object

  • At first, we will initialize the connection object.
   private static Connection getConnection() throws SQLException {
        Connection connection = DriverManager.getConnection("jdbc:postgresql://localhost/postgres",
                USERNAME, PASSWORD);
        if(connection != null){
            System.out.println("Connected to database");
        }

        connection.setAutoCommit(false);
        return connection;
    }

Copy Logic

  • Postgres driver provides Copy API that we can use to execute copy command that takes read input from STDIN.
  • We can copy values either from a file or from in-memory objects/entities.

From Input File

  • First, we need to get the CopyManager instance, which will provide copyIn method. The CopyIn method takes copy command as the first parameter and InputStream as the second parameter.
  • We can open InputStream from the input file.
private static void copy() throws SQLException, IOException {
        Connection connection = getConnection();
        CopyManager copyManager = new CopyManager((BaseConnection) connection);

         // file input stream
        InputStream inputStream = Files.newInputStream(Path.of("/Users/surajmishra/Desktop/Book-Ratings.csv"));
        
        long l = copyManager
                .copyIn("COPY book_reviews(user_id, isbn, book_rating) FROM STDIN delimiter ';' csv header encoding 'UTF-8'",
                        inputStream);

        connection.commit();
        connection.close();
    }
  • We ingested almost 3M records.
postgres=# select count(*) from book_reviews;
  count
---------
 2962898
(1 row)

From In Memory Object

  • We can also build a book_review object
private static BookReview getBookReview() {
        BookReview bookReview = new BookReview();
        String seed = UUID.randomUUID().toString();
        bookReview.setIsbn("isbn"+seed);
        bookReview.setBookRating("2.3");
        bookReview.setUserId(seed);
        return bookReview;
    }
  • And convert to csv record since we are reading input as CSV in COPY FROM command.
private static String toCsv(BookReview a) {
        return a.getUserId() + ", " +
                a.getIsbn() + ", " +
                a.getBookRating();
    }
  • Let’s create few book_review objects and convert them to CSV.
  • Now we need to convert the String of csvRows to InputStream since copyIn method takes InputStream as parameter.
  • We can convert String to bytes and construct ByteArrayInputStream which extends InputStream.
   public static void copy2() throws SQLException, IOException {
        List<BookReview> bookReviews = IntStream.rangeClosed(1, 10)
                .mapToObj(i -> getBookReview())
                .toList();

        String csvRows = bookReviews.stream()
                .map(LoadDataWithCopy::toCsv)
                .collect(Collectors.joining("\n"));

        InputStream is = new ByteArrayInputStream(csvRows.getBytes());

        Connection connection = getConnection();
        CopyManager copyManager = new CopyManager((BaseConnection) connection);
        long l = copyManager.copyIn("COPY book_reviews(user_id, isbn, book_rating) FROM STDIN delimiter ',' csv encoding 'UTF-8'",
                is);
   }

Conclusion

Before You Leave

Leave a Reply