Java - Validate CSV using SuperCSV with custom CsvColumnProcessor

[wp_ad_camp_1]

Validating CSV is super easy with SuperCSV and it will save your day.

Assume that we have a requirement from the management to validate new employee data coming from different parts of the United States with the following constraints.

New Employee Validation Rules

FIELD ORDER	FIELD NAME	DESCRIPTION	RULES
1	EMPLOYEE ID	Employee Id provided by HQ for a particular office branch.	Must be unique within the CSV file. Maximum length is 10 characters Format ########## Mandatory
2	LAST NAME	Last name of the new employee	Minimum and maximum length are 1 and 50 letters only, respectively. Mandatory
3	FIRST NAME	First name of the new employee	Minimum and maximum length are 1 and 50 letters only, respectively. Mandatory
4	SSN	SSN	Format: ###-##-#### Mandatory
5	HOME STATE	Current home state of the employee. Most like be the branch office's state	2-letter state name Mandatory
6	COUNTRY	US only. If empty, defaults to US	US only. If empty, defaults to US Mandatory
7	HIRE DATE	Employee's hire date	Format MM/DD/YYYY Mandatory
8	COMMENT	Any comments about the new employee from the hiring manager	Max 100 Optional

Implementation

First, we need to include a Maven dependency to the SuperCSV binaries.

	<dependency>
	    <groupId>net.sf.supercsv</groupId>
	    <artifactId>super-csv</artifactId>
	    <version>2.4.0</version>
	</dependency>

<groupId>net.sf.supercsv</groupId>

<artifactId>super-csv</artifactId>

</dependency>

Cell Processors

[wp_ad_camp_3]

SuperCSV has these Cell processors that are used for reading and writing CSV files. They automate type conversions, and can enforce constrains on each cell.

There are 4 types of cell processors – Reading, Writing, Reading/Writing, and Constraints. Please see http://super-csv.github.io/super-csv/cell_processors.html for more information.

On this post, we’ll use custom Constraints cell processors.

Interfaces and Classes

Keeping in mind that they may be other CSV-related tasks will be handed to you in the future, we’ll try to write good codes as much as possible so that things are easier to extend.

[wp_ad_camp_4]

So, we start with an interface.

package com.turreta.supercsv;

import org.supercsv.cellprocessor.ift.CellProcessor;

public interface CsvColumnProcessor {
    CellProcessor[] getProcessors();
}

package com.turreta.supercsv;

import org.supercsv.cellprocessor.ift.CellProcessor;

public interface CsvColumnProcessor {

CellProcessor[] getProcessors();

}

Then, we implement the interface.

package com.turreta.supercsv;

import org.supercsv.cellprocessor.ift.CellProcessor;

import com.turreta.supercsv.custom.EmployeeCommentCsvRule;
import com.turreta.supercsv.custom.EmployeeCountryCsvRule;
import com.turreta.supercsv.custom.EmployeeHireDateCsvRule;
import com.turreta.supercsv.custom.EmployeeHomeStateCsvRule;
import com.turreta.supercsv.custom.EmployeeIdCsvRule;
import com.turreta.supercsv.custom.EmployeeNameCsvRule;
import com.turreta.supercsv.custom.EmployeeSSNCsvRule;

public class NewEmployeeCsvProcessor implements CsvColumnProcessor {

    @Override
    public CellProcessor[] getProcessors() {

        // @formatter:off
        return new CellProcessor[] {

                // Employee Id
                new EmployeeIdCsvRule(),

                // Last name
                new EmployeeNameCsvRule(),

                // // First name
                new EmployeeNameCsvRule(),
                //
                // SSN
                new EmployeeSSNCsvRule(),

                // Home state
                new EmployeeHomeStateCsvRule(),

                // Country
                new EmployeeCountryCsvRule(),

                // Hire date
                new EmployeeHireDateCsvRule(),

                // Comment
                new EmployeeCommentCsvRule()

                // @formatter:on

        };
    }
}

package com.turreta.supercsv;

import org.supercsv.cellprocessor.ift.CellProcessor;

import com.turreta.supercsv.custom.EmployeeCommentCsvRule;

import com.turreta.supercsv.custom.EmployeeCountryCsvRule;

import com.turreta.supercsv.custom.EmployeeHireDateCsvRule;

import com.turreta.supercsv.custom.EmployeeHomeStateCsvRule;

import com.turreta.supercsv.custom.EmployeeIdCsvRule;

import com.turreta.supercsv.custom.EmployeeNameCsvRule;

import com.turreta.supercsv.custom.EmployeeSSNCsvRule;

public class NewEmployeeCsvProcessor implements CsvColumnProcessor {

@Override

public CellProcessor[] getProcessors() {

// @formatter:off

return new CellProcessor[] {

// Employee Id

new EmployeeIdCsvRule(),

// Last name

new EmployeeNameCsvRule(),

// // First name

new EmployeeNameCsvRule(),

// SSN

new EmployeeSSNCsvRule(),

// Home state

new EmployeeHomeStateCsvRule(),

// Country

new EmployeeCountryCsvRule(),

// Hire date

new EmployeeHireDateCsvRule(),

// Comment

new EmployeeCommentCsvRule()

// @formatter:on

};

}

You’ll notice there are 8 “rules” for each CSV column (each cell actually) to validate against.

Let’s look at one of them. The other files will be available in github.com link posted below.

[wp_ad_camp_2]

EmployeeIdCsvRule

The employee id “rule” looks like this.

package com.turreta.supercsv.custom;

import java.util.HashSet;
import java.util.Set;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.exception.SuperCsvConstraintViolationException;
import org.supercsv.util.CsvContext;

public class EmployeeIdCsvRule extends CellProcessorAdaptor {

    private final Set<Object> encounteredElements = new HashSet<Object>();

    @Override
    public <T> T execute(Object value, CsvContext context) {
        validateInputNotNull(value, context); // throws an Exception if the input is null

        String regex = "\\d{10}";

        if (value instanceof String && !value.toString().matches(regex)) {
            throw new SuperCsvConstraintViolationException(String.format("mismatched format value '%s' encountered - '%s'", value, regex), context, this);
        }

        if (!encounteredElements.add(value)) {
            throw new SuperCsvConstraintViolationException(String.format("duplicate value '%s' encountered", value), context, this);
        }

        return next.execute(value, context);
    }

}

package com.turreta.supercsv.custom;

import java.util.HashSet;

import java.util.Set;

import org.supercsv.cellprocessor.CellProcessorAdaptor;

import org.supercsv.exception.SuperCsvConstraintViolationException;

import org.supercsv.util.CsvContext;

public class EmployeeIdCsvRule extends CellProcessorAdaptor {

private final Set<Object> encounteredElements = new HashSet<Object>();

@Override

public <T> T execute(Object value, CsvContext context) {

validateInputNotNull(value, context); // throws an Exception if the input is null

String regex = "\\d{10}";

if (value instanceof String && !value.toString().matches(regex)) {

throw new SuperCsvConstraintViolationException(String.format("mismatched format value '%s' encountered - '%s'", value, regex), context, this);

}

if (!encounteredElements.add(value)) {

throw new SuperCsvConstraintViolationException(String.format("duplicate value '%s' encountered", value), context, this);

}

return next.execute(value, context);

}

We extend CellProcessorAdaptor and overriden the execute method. Notice, we used a Set object to track the list of unique Employee IDs in the CSV file. Once duplicate is detected, we throw

SuperCsvConstraintViolationException

1	SuperCsvConstraintViolationException

Also notice, we used a regular expression to ensure Employee IDs are 10-digit long.

CsvValidator

We create another class to read the CSV file and at the same time validate its contents.

package com.turreta.supercsv;

import java.io.File;
import java.io.FileReader;
import java.util.Map;

import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvMapReader;
import org.supercsv.io.ICsvMapReader;
import org.supercsv.prefs.CsvPreference;

/**
 * The Class CsvValidator.
 */
public class CsvValidator {

    /**
     * Validate CSV.
     *
     * @param fileName
     *            the file name
     * @param processors
     *            the processors
     * @throws Exception
     *             the exception
     */
    public static void validateNewEmployee(File file, CellProcessor[] processors) throws Exception {
        ICsvMapReader mapReader = null;
        try {
            mapReader = new CsvMapReader(new FileReader(file), CsvPreference.STANDARD_PREFERENCE);

            final String[] header = mapReader.getHeader(true);
            final CellProcessor[] processor = processors;

            Map<String, Object> subMap;
            while ((subMap = mapReader.read(header, processor)) != null) {
                // logger.info(String.format("lineNo=%s, rowNo=%s, master=%s", mapReader.getLineNumber(),
                // mapReader.getRowNumber(), subMap));
            }
        } finally {
            if (mapReader != null) {
                mapReader.close();
            }
        }
    }
}

package com.turreta.supercsv;

import java.io.File;

import java.io.FileReader;

import java.util.Map;

import org.supercsv.cellprocessor.ift.CellProcessor;

import org.supercsv.io.CsvMapReader;

import org.supercsv.io.ICsvMapReader;

import org.supercsv.prefs.CsvPreference;

/**

* The Class CsvValidator.

public class CsvValidator {

/**

* Validate CSV.

* @param fileName

* the file name

* @param processors

* the processors

* @throws Exception

* the exception

public static void validateNewEmployee(File file, CellProcessor[] processors) throws Exception {

ICsvMapReader mapReader = null;

try {

mapReader = new CsvMapReader(new FileReader(file), CsvPreference.STANDARD_PREFERENCE);

final String[] header = mapReader.getHeader(true);

final CellProcessor[] processor = processors;

Map<String, Object> subMap;

while ((subMap = mapReader.read(header, processor)) != null) {

// logger.info(String.format("lineNo=%s, rowNo=%s, master=%s", mapReader.getLineNumber(),

// mapReader.getRowNumber(), subMap));

}

} finally {

if (mapReader != null) {

mapReader.close();

}

Download

If you want to look at the other files, please download the source code files from https://github.com/Turreta/turreta-supercsv-validation-example

[wp_ad_camp_5]

New Employee Validation Rules

Implementation

Cell Processors

Interfaces and Classes

EmployeeIdCsvRule

CsvValidator

Download

Related Articles

You Might Also Like

Search the site