XProc for testing XSD2SCH

Rick Jelliffe posted last week the beta version of an XML schema (XSD) to Schematron converter.

This convertor is a pipeline consisting of several consecutive XSLT transformations.

The idea is to use this convertor against the XML Schemas test suite published by W3C to check if the validation results of the schematron validation comply with the validation results coming from the XML schemas, from which the schematrons have been derived.

So the first thing to do is to convert the schemas from the test suite to their schematron equivalent.

It looked like a perfect opportunity to enhance my XProc pipelining skills.

What is needed is:
  1. read a directory from the W3C testsuite
  2. filter out all non xsd files
  3. for every xsd file
  1. run the convertor, being a sequence of XSLT transformation
  2. write the result of the transformation to the file system.
This is what my first attempt looks like:

<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="file:/Applications/oxygen%2010/frameworks/xproc/xproc.rnc" type="compact"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step">
<p:input port="source"/>
<p:output port="result" sequence="true"/>
<p:directory-list name="list" include-filter=".*\.xsd"
path="file:///Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/"/>
<p:make-absolute-uris match="c:file/@name">
<p:with-option name="base-uri" select="/c:directory/@xml:base"/>
</p:make-absolute-uris>
<p:for-each>
<p:output port="result">
<p:pipe port="result" step="write2file"/>
</p:output>
<p:iteration-source select="/c:directory/c:file"/>
<p:variable name="file" select="/c:file/@name"/>
<p:load>
<p:with-option name="href" select="$file"/>
</p:load>
<p:xslt name="xsd-include">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/include.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-flatten">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/flatten.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-expand-ref">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/expand.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-to-sch">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/xsd2sch.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="compress">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/compress.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:store name="write2file">
<p:with-option name="href" select="concat($file,'.sch')"/>
<p:input port="source"/>
</p:store>

</p:for-each>
</p:declare-step>

The same now but commented:

<?xml version="1.0" encoding="UTF-8"?>

<!-- I have been using the RelaxNG compact schema from within Oxygen to edit the pipeline,
hence the Oxygen specific processing instruction -->
<?oxygen RNGSchema="file:/Applications/oxygen%2010/frameworks/xproc/xproc.rnc" type="compact"?>

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step">
<p:input port="source"/>
<p:output port="result" sequence="true"/>

<!-- The pipeline has an input and an output port; the input is not used,
the output has multiple output documents, hence sequence = 'true' -->
<p:directory-list name="list" include-filter=".*\.xsd"
path="file:///Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/"/>

<!-- The directory-list step makes a list of files and directories starting from a path.
This list can be filtered using a regular expression.
The output being XML looks like :
<c:directory xmlns:c="http://www.w3.org/ns/xproc-step" name="simpleType"
xml:base="/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/">
<c:file name="stA001.xsd"/>
<c:file name="stA002.xsd"/>
<c:file name="stA003.xsd"/>
<c:file name="stA003b.xsd"/>
...
</c:directory>
-->
<p:make-absolute-uris match="c:file/@name">
<p:with-option name="base-uri" select="/c:directory/@xml:base"/>
</p:make-absolute-uris>

<!--
URI are expanded as seen in following snippet.
<c:directory xmlns:c="http://www.w3.org/ns/xproc-step" name="simpleType"
xml:base="/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/">
<c:file
name="file:/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/stA001.xsd"/>
<c:file
name="file:/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/stA002.xsd"/>
<c:file
name="file:/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/stA003.xsd"/>
<c:file
name="file:/Users/paul/Desktop/R&amp;D/xmlschema2006-11-06/msData/simpleType/stA003b.xsd"/>
...
</c:directory>
-->
<p:for-each>
<!-- We want to loop over the files now.
Look at element iteration-source below where the scope is set,
e.g. on element c:file of the incoming XML
-->

<p:output port="result">
<p:pipe port="result" step="write2file"/>
</p:output>
<p:iteration-source select="/c:directory/c:file"/>
<p:variable name="file" select="/c:file/@name"/>
<!-- we capture the absolute path of the file from @name -->
<p:load>
<p:with-option name="href" select="$file"/>
</p:load>
<!-- we explicitly load the xsd file -->
<!-- here starts the sequence of XSLT transformations where the output
of the preceding step serves as input for the next -->

<p:xslt name="xsd-include">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/include.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-flatten">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/flatten.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-expand-ref">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/expand.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="xsd-to-sch">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/xsd2sch.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:xslt name="compress">
<p:input port="source"/>
<p:input port="stylesheet">
<p:document href="xslt/compress.xsl"/>
</p:input>
<p:input port="parameters">
<p:empty/>
</p:input>
</p:xslt>
<p:store name="write2file">
<p:with-option name="href" select="concat($file,'.sch')"/>
<p:input port="source"/>
</p:store>
<!-- the result of the transformations is written away to the file system
using the same name and path as the original xsd appended with '.sch'. -->

<!-- the output of the store step is explicitly piped to the output of the iteration.
see above.
<p:output port="result">
<p:pipe port="result" step="write2file"/>
</p:output>
-->

</p:for-each>
</p:declare-step>
So far, so good.
Now that we have the schematrons we need to compare their validation results with those from the test suite.
That's now the next thing to do.

Comments

Tony (unauthenticated)
Aug 5, 2009

Thanks so much for this! I have having trouble understanding how directory-list was supposed to work.

I don't suppose it's possible to get your XML example code formatted according to nesting? It looks like some indentation was messed up when copying the code to (X)HTML.

Keep up the great work!

Living in the XML and RDF world
Aug 8, 2009

Tony,

If you send me (paul@proxml.be) your email address I can send you the code.

Paul