0

Very Slow XML Validation against XSDs with and elements

This post demonstrates that XML validation against XSDs with <xs:unique> is very slow especially for very huge list of elements, e.g., 300,000 items. We are better off ensuring distinct values through codes.

This also applies to <xs:key>.

We will test <xs:unique>using an XML with a list of 300,000 elements.

Set Up

For testing purposes, we have the following files – Schema and XML files.

Schema File

XML File

This XML file is used to validate against the aforementioned schema.

The actual file is available here – myxml.zip.

Java Codes

JAXB Class

This has been created manually and the annotations are very important.

Codes to validate XML

Testing and Profiling with JProfiler

The validation takes more than 1 minute which is not acceptable in most use-cases.

At 3+ minutes

We have the following objects consuming the most memory.

Stuff in memory

After 12 minutes of validation

After 12 minutes, the application is still running.

At 12 minutes, the validation has not completed yet.

Faster Validation

To improve the performance of our application, we need to do away with Schema-based validation for element uniqueness and perform the validation in our codes instead.

Modified Schema File

We removed the <xs:unique> element.

XML File

No changes to the file.

New Java Codes

Testing

Now that is fast for 300k elements! Basically, we moved the validation for uniqueness of elements to Java codes. Imagine what it can do with 1 million elements!

Validation in 538 milliseconds for 300,000 items

Correction: It should display “Time (ms): 538”. The time is in milliseconds.

Download the codes

https://github.com/Turreta/demo-Very-Slow-XML-Validation-against-XSDs-with-xs-unique-and-xs-key-elements