We have encountered a performance issue with Java validating an XML with 300,00 elements against an XSD with <xs:unique> and <xs:key>. This post uses Xerces C++ to parse and validate a huge XML against an XSD with those elements. Also, it attempts to validate the performance issue outside the Java ecosystem.
Requirements
We use the following items for this post. Note that we will not create any C/C++ codes.
- Xerces C++ binaries
- Download from this URL
- An XSD file that uses the unique and key constraints
- A huge XML file to parse and validate
- Windows 10
- C/C++ IDE, .e.g, CLion 2021.3.4
- PilotEdit Lite
- To edit a huge XML file, we need to reference a local XSD file.
Download Xerces C++ Source Codes And Compile Them
First, we need to download Xerces C++ binaries for Windows.
Test Faulty XSD File With Xerces C++ PParse.exe
Then, we test our XML file with a faulty XSD using Xerces C++ Parse.exe executable. For example, consider the following faulty XSD that will cause the performance issue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="names"> <xs:complexType> <xs:sequence> <xs:element name="name" maxOccurs="unbounded" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:unique name="uniqueName"> <xs:selector xpath="name"/> <xs:field xpath="."/> </xs:unique> </xs:element> </xs:schema> |
Note that this XSD uses a unique constraint element. Similarly, the key constraint element will cause the same problem. Next, place the XML and XSD files in the Xerces C++ bin directory below.
Then, we modify the XML to refer to the XSD file in the same directory explicitly.
Next, we use the PParse.exe file to parse and validate our XML file.
Testing Xerces C++ With The Unique Constraint
We can see that the application has not yet completed the validation after ~5 to 6 minutes. The operation has taken too much time to complete already.
Testing Xerces C++ Without The Unique Constraint
This time it took only 323 ms. Great!
Did We Validate The XML Against the XSD?
Yes! We used an invalid element, “nameXX,” and the XML validation failed.
Using <xs:unique> or <xs:key> elements for validation is fine. However, if we process many elements, using these constraints hurts an application’s performance.