Tags

Co-constraints and XML, part 3

Co-constraints and XML, part 1
Co-constraints and XML, part 2

Checking co-constraints in schematron

Schematron is a very simple, but o so powerful rule-based validation language for making assertions about the presence or absence of patterns in XML trees.

An example.

<rule context="para">
<assert test="@name">element para misses attribute name.</assert>
</rule>
Two important points:
  • you define a rule for certain contexts. This context is defined using an XPath expression. In the example the context is every element with name "para"
  • for this context you define a test. In the example, the context, e.g. a para element is tested for having an attribute "name". This is an assertion. If this is not the case, the message "element para misses attribute name" is thrown.
Using the power of XPath and surely version 2 allows you to define the most demanding co-constraints.

In our use case, the to be solved requirement is that based on the value of attribute scheme, referring to an in XML published controlled vocabulary of allowed terms, the value of an element needs to be checked against these published values.

<creator scheme="province">Brussel</creator>
http://www.site.be/cv/province.xml returns:

<cv>
<cvvalue>Antwerpen</cvvalue>
<cvvalue>Limburg</cvvalue>
<cvvalue>Vlaams-Brabant</cvvalue>
<cvvalue>Oost-Vlaanderen</cvvalue>
<cvvalue>West-Vlaanderen</cvvalue>
</cv>
The question is to test if "Brussel" is an allowed value.

Our schematron rule is as follows:

<rule context="*[@scheme]">
<let name="basepath" value="string('http://www.site.be/cv/')"/>
<let name="cv" value="@scheme"/>
<let name="path" value="concat($basepath, $cv, '.xml')"/>
<assert test=". = document($path)/cv/cvvalue"> Used value "<value-of select="."/>" doesn't appear in cv. "
<value-of select="$path"/>".</assert>
</rule>

The same technique is described in a recent blog entry from Rick Jelliffe on O'Reilly, Validating Code Lists with Schematron.

Schematron or XSD


While Schematron can express the same constraints that can be expressed in a grammar-based schema language such as XSD, most of the time it is used as an adjunct to address the intrinsic weak points of xsd's.

In fact using a schema-aware XPath 2.0 processor your schematron rules can become much more consise and more easy to manage.

In our case we have for example following rule:
<rule context="*[namespace-uri(.) = 'http://purl.org/dc/terms/' and local-name(.)=
('conformsTo','hasFormat','hasPart','hasVersion','isFormatOf',
'isPartOf','isReferencedBy','isReplacedBy','isRequiredBy','isVersionOf','references',
'relation','replaces','requires','source','tableOfContents')]">
<assert test="string-length(.) > 1 or string-length(string(@resourceIdentifier)) > 1">
The element with name <name path="."/> needs to have content,
or needs to have a value for attribute @resourceIdentifier.
One of both needs to be filled in. </assert>
</rule>
As you can see the context defines a whole list of elements from the dublin core namespace that need to be tested in the same way.
In the background however all of these elements use the same xsd defined datatype.

So with a schema-aware XPath processor this test can be rewritten as:

<rule context="element(*, dcterms:InformationObject)">
<assert test="string-length(.) > 1 or string-length(string(@resourceIdentifier)) > 1">
The element with name <name path="."/> needs to have content,
or needs to have a value for attribute @resourceIdentifier.
One of both needs to be filled in. </assert>
</rule>
If a new element is added using the same datatype I need to add it to the rule context if using the non-schema-aware version; with the schema-aware version I can relax.

Conclusion

Schematron rules, thanks to Rick Jelliffe.
If having also an xsd, it even becomes more powerful using a schema-aware XPath 2.0 processor, thanks to Michael Kay.

Co-constraints and XML, part 4

Comments