Introduction
CSV files are one of the common way to store , exchange structured data between servers along with other popular structured data format. There are many libraries to read CSV files and one of the popular library is OpenCSV. In this article we will use OpenCSV library to read CSV file using Java.
Input File
I am using Netflix daily top 10 dataset from kaggle as my input file.
Open CSV
Open CSV is one of the most popular CSV parser. It provides both basics and advance functionalities that is more than sufficient to read any CSV file. For example , if we want to skip Header from the CSV file we can do so easily with skip_line method. If we need to validate row before processing it , we can write using RowValidator interface and filter out all the rows that don’t follow validation.
Reading CSV File
Reading Using CSVReader
Simplest client that can be built is using CSVReader class. We need to pass FileReader Object
FileReader fileReader = new FileReader(INPUT_CSV_FILE);
CSVReader csvReader = new CSVReader(reader);
After building csvReader we can either read entire file in memory or read file one by one.
Reading entire file in memory
public static List<String[]> readAll(Reader reader) throws Exception {
CSVReader csvReader = new CSVReader(reader);
List<String[]> list = new ArrayList<>();
list = csvReader.readAll();
}
This approach would not work well and end up throwing exception if memory is limited.
i tried reading 1 GB file end up throwing exception
Reading file one line at a time
using readNext() method we can read one line at a time. it will parse entire line into String[] object.
private static void readOneByOne(FileReader fileReader) throws IOException, CsvValidationException {
CSVReader csvReader = new CSVReader(fileReader);
String[] csvRecord;
while((csvRecord = csvReader.readNext())!=null){
System.out.println(Arrays.toString(csvRecord));
}
reader.close();
csvReader.close();
}
Configuring CSVReader Object
CSVReader class is configurable and provides builder pattern to configure it. For example in the blow code snippet we are configuring it to skip line #1 since it is header and also adding row validator to validate # of columns in each row.
CSVReader csvReader = new CSVReaderBuilder(fileReader)
.withSkipLines(1)
.withRowValidator(rowValidator)
.build();
Validating Each Record with Row Validator
RowValidator rowValidator = new RowValidator() {
@Override
public boolean isValid(String[] line) {
return line.length==COLUMN_LENGHT;
}
@Override
public void validate(String[] strings) throws CsvValidationException {
if(!isValid(strings)){
throw new CsvValidationException("column count should be " + COLUMN_LENGHT);
}
}
};
Reading CSV File Into Java Beans
Most likely when we read CSV file, we would convert each record into Java object. For example, if we are reading customer account information , we will write a mapper that would map the read string values into Java POJO name CustomerAccount. Open CSV provides annotation to bind CSV record to POJO. All we have to do is to use annotations on fields inside our Java POJO.
// HEADER : As of,Rank,Year to Date Rank,Last Week Rank,Title,Type,Netflix Exclusive,Netflix Release Date,Days In Top 10,Viewership Score
public class NetflixTo10Mvs {
@CsvBindByName ( column = "As Of",required=true)
private String asOf;
@CsvBindByName ( column = "Rank",required=true)
private String rank;
@CsvBindByName ( column = "Year to Date",required=true)
private String yearToDate;
}
List<NetflixTo10Mvs> moviesList = new CsvToBeanBuilder(fileReader)
.withType(NetflixTo10Mvs.class)
.build()
.parse();
moviesList.stream().forEach(movie-> System.out.println(movie.toString()));
Output
Reading with Position
Not all the files are nice and comes with header, hence to handle files that don’t come with header we can use position binding as well like shown below.
public class NetflixTo10Mvs {
@CsvBindByPosition(position = 0)
private String asOf;
@CsvBindByPosition (position = 1)
private String rank;
}
Conclusion
In this article, we used OpenCSV library to read CSV file using Java. This article just touches the surface , but if you want to know all the supported features for complex use cases you can read here.
Bonus Tip
- If you want to upskill your Java, you should definitely check out this bestseller course
Further Reading
- http://opencsv.sourceforge.net/#csvparser