0

Java – Validate CSV using SuperCSV with custom CsvColumnProcessor

Validating CSV is super easy with SuperCSV and it will save your day.

Assume that we have a requirement from the management to validate new employee data coming from different parts of the United States with the following constraints.

New Employee Validation Rules

FIELD ORDERFIELD NAMEDESCRIPTIONRULES
1EMPLOYEE IDEmployee Id provided by HQ for a particular office branch.Must be unique within the CSV file.

Maximum length is 10 characters

Format ##########

Mandatory
2LAST NAMELast name of the new employeeMinimum and maximum length are 1 and 50 letters only, respectively.

Mandatory
3FIRST NAMEFirst name of the new employeeMinimum and maximum length are 1 and 50 letters only, respectively.

Mandatory
4SSNSSN Format:

###-##-####

Mandatory
5HOME STATECurrent home state of the employee. Most like be the branch office's state2-letter state name

Mandatory
6COUNTRYUS only. If empty, defaults to USUS only. If empty, defaults to US

Mandatory
7HIRE DATEEmployee's hire dateFormat MM/DD/YYYY

Mandatory
8COMMENTAny comments about the new employee from the hiring managerMax 100

Optional

Implementation

First, we need to include a Maven dependency to the SuperCSV binaries.

Cell Processors

SuperCSV has these Cell processors that are used for reading and writing CSV files. They automate type conversions, and can enforce constrains on each cell.

There are 4 types of cell processors – Reading, Writing, Reading/Writing, and Constraints. Please see http://super-csv.github.io/super-csv/cell_processors.html for more information.

On this post, we’ll use custom Constraints cell processors.

Interfaces and Classes

Keeping in mind that they may be other CSV-related tasks will be handed to you in the future, we’ll try to write good codes as much as possible so that things are easier to extend.

So, we start with an interface.

Then, we implement the interface.

You’ll notice there are 8 “rules” for each CSV column (each cell actually) to validate against.

Let’s look at one of them. The other files will be available in github.com link posted below.

EmployeeIdCsvRule

The employee id “rule” looks like this.

We extend CellProcessorAdaptor and overriden the execute method. Notice, we used a Set object to track the list of unique Employee IDs in the CSV file. Once duplicate is detected, we throw

Also notice, we used a regular expression to ensure Employee IDs are 10-digit long.

CsvValidator

We create another class to read the CSV file and at the same time validate its contents.

Download

If you want to look at the other files, please download the source code files from https://github.com/Turreta/turreta-supercsv-validation-example