Java, Software Development

Very Slow XML Validation against XSDs with unique and key elements

This post shows that XML XSD validation with  <xs:unique> is very slow, especially for a huge list of elements, e.g., 300,000 items. We are better off ensuring distinct values through Java codes. The problem also applies to <xs:key>. Worse, we can even replicate the performance issue in Xerces C++.

Set-Up

We have the following files – Schema and XML files for testing purposes. We will also test <xs:unique> using an XML with a list of 300,000 elements.

Schema File With Unique

We use the following XSD for XML validation. While the XSD looks simple enough, it will cripple our Java application when we validate huge XML files.

Then, we have the following XML file to validate against the schema above.

Please download the actual file from this link – myxml.zip.

Java Codes To Validate XML Against XSD

First, we create the following class manually and use the annotations.

Then, we craft Java codes to read both the XML and XSD to test the slow validation.

Testing and Profiling with JProfiler

The validation takes more than 1 minute, which is unacceptable in most cases.

Slow XML XSD Validation - At 3+ minutes mark

We have the following objects consuming the most memory.

Slow XML XSD Validation - Stuff in memory

After 12 minutes, the application is still running.

Slow XML XSD Validation - At 12 minutes, the validation has not completed yet.

Fix Slow XSD XML Validation

To improve the performance of our application, we need to do away with Schema-based validation for element uniqueness and perform the proof in our codes instead.

First, we removed the <xs:unique> element from our XSD file.

Then, we skip changing the XML file.

New Java Codes To Fix Slow XSD XML Validation

Our Java now ensures we are only dealing with unique elements. Also, the XSD XML validation will still work, but it will not check for unique values.

Testing Faster XSD XML Validation

Now that is fast for 300k elements! Basically, we moved the validation for the uniqueness of elements to Java codes. Imagine what it can do with 1 million elements!

Validation in 538 milliseconds for 300,000 items

Correction: It should display “Time (ms): 538”. The time is in milliseconds.

For more details on how this XML XSD validation is so slow, please check out the Java codes which are available on GitHub.

 

Got comments or suggestions? We disabled the comments on this site to fight off spammers, but you can still contact us via our Facebook page!.


You Might Also Like