[wp_ad_camp_1]
Validating CSV is super easy with SuperCSV and it will save your day.
Assume that we have a requirement from the management to validate new employee data coming from different parts of the United States with the following constraints.
New Employee Validation Rules
FIELD ORDER | FIELD NAME | DESCRIPTION | RULES |
---|---|---|---|
1 | EMPLOYEE ID | Employee Id provided by HQ for a particular office branch. | Must be unique within the CSV file. Maximum length is 10 characters Format ########## Mandatory |
2 | LAST NAME | Last name of the new employee | Minimum and maximum length are 1 and 50 letters only, respectively. Mandatory |
3 | FIRST NAME | First name of the new employee | Minimum and maximum length are 1 and 50 letters only, respectively. Mandatory |
4 | SSN | SSN | Format: ###-##-#### Mandatory |
5 | HOME STATE | Current home state of the employee. Most like be the branch office's state | 2-letter state name Mandatory |
6 | COUNTRY | US only. If empty, defaults to US | US only. If empty, defaults to US Mandatory |
7 | HIRE DATE | Employee's hire date | Format MM/DD/YYYY Mandatory |
8 | COMMENT | Any comments about the new employee from the hiring manager | Max 100 Optional |
Implementation
First, we need to include a Maven dependency to the SuperCSV binaries.
1 2 3 4 5 | <dependency> <groupId>net.sf.supercsv</groupId> <artifactId>super-csv</artifactId> <version>2.4.0</version> </dependency> |
Cell Processors
[wp_ad_camp_3]
SuperCSV has these Cell processors that are used for reading and writing CSV files. They automate type conversions, and can enforce constrains on each cell.
There are 4 types of cell processors – Reading, Writing, Reading/Writing, and Constraints. Please see http://super-csv.github.io/super-csv/cell_processors.html for more information.
On this post, we’ll use custom Constraints cell processors.
Interfaces and Classes
Keeping in mind that they may be other CSV-related tasks will be handed to you in the future, we’ll try to write good codes as much as possible so that things are easier to extend.
[wp_ad_camp_4]
So, we start with an interface.
1 2 3 4 5 6 7 | package com.turreta.supercsv; import org.supercsv.cellprocessor.ift.CellProcessor; public interface CsvColumnProcessor { CellProcessor[] getProcessors(); } |
Then, we implement the interface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | package com.turreta.supercsv; import org.supercsv.cellprocessor.ift.CellProcessor; import com.turreta.supercsv.custom.EmployeeCommentCsvRule; import com.turreta.supercsv.custom.EmployeeCountryCsvRule; import com.turreta.supercsv.custom.EmployeeHireDateCsvRule; import com.turreta.supercsv.custom.EmployeeHomeStateCsvRule; import com.turreta.supercsv.custom.EmployeeIdCsvRule; import com.turreta.supercsv.custom.EmployeeNameCsvRule; import com.turreta.supercsv.custom.EmployeeSSNCsvRule; public class NewEmployeeCsvProcessor implements CsvColumnProcessor { @Override public CellProcessor[] getProcessors() { // @formatter:off return new CellProcessor[] { // Employee Id new EmployeeIdCsvRule(), // Last name new EmployeeNameCsvRule(), // // First name new EmployeeNameCsvRule(), // // SSN new EmployeeSSNCsvRule(), // Home state new EmployeeHomeStateCsvRule(), // Country new EmployeeCountryCsvRule(), // Hire date new EmployeeHireDateCsvRule(), // Comment new EmployeeCommentCsvRule() // @formatter:on }; } } |
You’ll notice there are 8 “rules” for each CSV column (each cell actually) to validate against.
Let’s look at one of them. The other files will be available in github.com link posted below.
[wp_ad_camp_2]
EmployeeIdCsvRule
The employee id “rule” looks like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | package com.turreta.supercsv.custom; import java.util.HashSet; import java.util.Set; import org.supercsv.cellprocessor.CellProcessorAdaptor; import org.supercsv.exception.SuperCsvConstraintViolationException; import org.supercsv.util.CsvContext; public class EmployeeIdCsvRule extends CellProcessorAdaptor { private final Set<Object> encounteredElements = new HashSet<Object>(); @Override public <T> T execute(Object value, CsvContext context) { validateInputNotNull(value, context); // throws an Exception if the input is null String regex = "\\d{10}"; if (value instanceof String && !value.toString().matches(regex)) { throw new SuperCsvConstraintViolationException(String.format("mismatched format value '%s' encountered - '%s'", value, regex), context, this); } if (!encounteredElements.add(value)) { throw new SuperCsvConstraintViolationException(String.format("duplicate value '%s' encountered", value), context, this); } return next.execute(value, context); } } |
We extend CellProcessorAdaptor and overriden the execute method. Notice, we used a Set object to track the list of unique Employee IDs in the CSV file. Once duplicate is detected, we throw
1 | SuperCsvConstraintViolationException |
Also notice, we used a regular expression to ensure Employee IDs are 10-digit long.
CsvValidator
We create another class to read the CSV file and at the same time validate its contents.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | package com.turreta.supercsv; import java.io.File; import java.io.FileReader; import java.util.Map; import org.supercsv.cellprocessor.ift.CellProcessor; import org.supercsv.io.CsvMapReader; import org.supercsv.io.ICsvMapReader; import org.supercsv.prefs.CsvPreference; /** * The Class CsvValidator. */ public class CsvValidator { /** * Validate CSV. * * @param fileName * the file name * @param processors * the processors * @throws Exception * the exception */ public static void validateNewEmployee(File file, CellProcessor[] processors) throws Exception { ICsvMapReader mapReader = null; try { mapReader = new CsvMapReader(new FileReader(file), CsvPreference.STANDARD_PREFERENCE); final String[] header = mapReader.getHeader(true); final CellProcessor[] processor = processors; Map<String, Object> subMap; while ((subMap = mapReader.read(header, processor)) != null) { // logger.info(String.format("lineNo=%s, rowNo=%s, master=%s", mapReader.getLineNumber(), // mapReader.getRowNumber(), subMap)); } } finally { if (mapReader != null) { mapReader.close(); } } } } |
Download
If you want to look at the other files, please download the source code files from https://github.com/Turreta/turreta-supercsv-validation-example
[wp_ad_camp_5]