Tags

XProc and Calabash: part 1

Context

Some time ago I developed an XML validation framework where two schema technologies are used in conjunction. First an XSD schema validation is done, followed by an ISO schematron validation using xsd schema-aware XPath expressions in the rules, where tests are done on the datatypes, not on the elements themselves, to assert additional constraints on the XML file, which can not be checked by xsd version 1.0.
An example of such a schematron rule:

<rule context="element(*, dcterms:PeriodOfTime)[end]">
<assert test="start lt end">Startdate must be before enddate.</assert>
</rule>

This framework asks for a pipeline approach where those two validating processes can be connected together.

Since in the XML world XProc: An XML Pipeline Language is approaching final W3C Recommendation, I thought it should be a good exercise to write the two-stage validation XProcwise.

XProc is an XML vocabulary for which a RelaxNG compact scheme is referenced in the spec. The first thing we did was pulling in the schema in our favourite XML editor to start our first XProc file.

Loading the RelaxNG file into Oxygen

with this as result: an empty pipeline file.

Empty pipeline file

Validating with an XSD schema

What we want to do in the first phase is validating an XML file using an XSD file.

In the spec one find all kinds of steps one can use in an XML pipeline. One of the optional ones is "p:validate-with-xml-schema".

This step is declared as follows.

validate-with-xml-schema

meaning this step takes two inputs: the XML itself, the XSD schema and generates one output 'result'. In addition you can pass some options.

Which gives us for example:

<pipeline xmlns="http://www.w3.org/ns/xproc">
<validate-with-xml-schema>
<input port="source">
<document href="test.xml"/>
</input>
<input port="schema">
<document href="test1.xsd"/>
</input>
</validate-with-xml-schema>
</pipeline>

Both source and schema are explicitly bound here to an "explicitly pointed to" document.
If the XML source contains itself a pointer to the schema file using the xsi:noNamespaceSchemaLocation or xsi:SchemaLocation attributes following code will do:

<pipeline xmlns="http://www.w3.org/ns/xproc">
<validate-with-xml-schema>
<input port="source">
<document href="test.xml"/>
</input>
<input port="schema">
<empty/>
</input>
</validate-with-xml-schema>
</pipeline>

Important: the schema port needs to be explicitly bound, hence the use of the element "empty".

Running XProc pipelines

For running the pipelines, I'm using Calabash.

Calabash is a command-line application to run XProc pipelines.

The usage is:

Usage: com.xmlcalabash.drivers.Main [switches] [pipeline.xpl] [options]
Where switches are:
-a, --schema-aware Turn on schema-aware processing
-b, --binding prefix=uri Specify namespace binding
-c, --config configfile Specify a particular configuration file
-d, --debug Turn on debugging
-E, --entity-resolver className Specify a resolver class for URI resolution
-i, --input port=uri Bind the specified input port
-L, --log-level level Specify the default logging level
-l, --library library.xpl Load the specified library
-o, --output port=uri Bind the specified output port
-p, --with-param [port@]param=value Specify a parameter
-S, --safe-mode Request "safe" execution
-s, --step-name stepname Run the step named 'stepname'
-U, --uri-resolver className Specify a resolver class for URI resolution

In my case this leads to following command:

java -cp /Applications/calabash-0.9.3/lib/calabash.jar:
/Applications/saxonsa9-1-0-5j/saxon9sa.jar:
/Applications/saxonsa9-1-0-5j/saxon9-s9api.jar:
/Applications/saxonsa9-1-0-5j/
com.xmlcalabash.drivers.Main -a /Users/paul/Desktop/R&D/calabash/test.xpl

Note the use of the -a switch for turning on schema-aware processing.

Note also that we can bind the input on the command line

com.xmlcalabash.drivers.Main -a -isource=/Users/paul/Desktop/R&D/calabash/test.xml
/Users/paul/Desktop/R&D/calabash/test.xpl

which leads then to following XProc file

<pipeline xmlns="http://www.w3.org/ns/xproc">
<validate-with-xml-schema>
<input port="schema">
<empty/>
</input>
</validate-with-xml-schema>
</pipeline>

An example error message generated by this pipeline:

Validation error on line 5 column 17 of test.xml:
XTTE1510: Required attribute @scheme is missing (See
http://www.w3.org/TR/xmlschema-1/#cvc-complex-type clause 4)
Error : Pipeline failed:
err:XC0053: null
It is a dynamic error if the assert-valid option is true and the input document is not valid.
Process ended with exit code: 0

Next time, we move on to the schematron part.

Comments