Data constraints and quality in a linked open data world

This talk, held at the europeana tech conference in Vienna, 4th of October, did focus on the difficulties encountered while trying to check data constraints in an open world where anyone can say anything about any topic.
The data quality approaches taken 5 years ago in the belgian ErfgoedPlus (HeritagePlus) project including the lessons learned are elaborated upon.
Since then new approaches appeared: SPIN, RIF, OWL used with a closed world assumption.

Slides used: Download file "Europeana_SemWeb_DataQuality.pdf"

0 comments

From Traditional Dublin Core to Linked Data: Lessons Learned

This is the title of a talk held at the Semantic Tech & Business Conference in London, 27th of September.
The talk described the experiences gained during the Linked Data-isation of the OWMS (government-wide metadata standard) project at the Dutch Government under teamleadership of Hans Overbeek (ICTU).

The slides used: Download file "owms.pdf"

A more thorough description of the project was published in the August issue of the magazine Informatie.

The article itself in Dutch: Download file "1106-40Ove.pdf"

0 comments

DCMI Linked Data tutorial

The slides used during the Linked Data and Dublin Core tutorial at The Hague are available at http://dcevents.dublincore.org/index.php/IntConf/index/pages/view/tutorials-2011#hermans.


0 comments

Informatie aan Zee 2011

Slides of the "Linked Library Data: step by step" session (in Dutch) at Informatie aan Zee 2011 are available at
Slides Linked Data .

0 comments

Linked Data for the cultural heritage sector

For those who read Dutch, I have been working, as a follow-up to the 'Publiceren van Linked Data' workshop held at Dutch Culture Linked Open Data Event 2011, on a series of articles (still work in progress) describing the different steps to be taken to get your cultural heritage data onto the Linked Data web. These articles can be found at the DEN Lab website.
A request has come in to translate this series into English. This is taken into account, but has to wait until I'm sure flaws and misses have been cured in the Dutch version.

0 comments

Bad content again

I have been reading the book "A Developer's Guide to the Semantic Web" from Liyang Yu.


A developer's Guide
I was mildly positive until I did get to the chapter on OWL. According to me it contains a lot of content which is simply wrong.
Some examples.

owl:someValuesFrom


This is how the class ExpensiveDSLR has been defined.

<owl:Class rdf:about="http://www.liyangyu.com/camera#ExpensiveDSLR_1">
<rdfs:subClassOf rdf:resource="http://www.liyangyu.com/camera#DSLR"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://www.liyangyu.com/camera#owned_by"/>
<owl:someValuesFrom rdf:resource="http://www.liyangyu.com/camera#Professional"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>

Then we have an instance defined (p. 168) as follows.

<myCamera:ExpensiveDSLR_1 rdf:about="http://dbpedia.org/resource/Canon_EOS-1D_1">
<myCamera:owned_by rdf:resource="http://www.liyangyu.com/people#Tom"/>
</myCamera:ExpensiveDSLR_1>

The author claims then that following fact will be inferred.

<http://www.liyangyu.com/people#Tom> rdf:type <http://www.liyangyu.com/camera#Professional>

I checked this with Pellet and other reasoners but no one infers this.

The author forgets that we are working with an Open World Assumption. At any time someone else can come along with new information. So at this stage one is not certain that the only owner known is a Professional; hence no inference.

owl:someValuesFrom is used to infer subject from objects, not the other way around as the author claims.

owl:hasValue

ExpensiveDSLR defined using owl:hasValue:

<owl:Class rdf:about="http://www.liyangyu.com/camera#ExpensiveDSLR_2">
<rdfs:subClassOf rdf:resource="http://www.liyangyu.com/camera#DSLR"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://www.liyangyu.com/camera#cost"/>
<owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>expensive</owl:hasValue>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
The related instance:

<myCamera:DSLR rdf:about="http://dbpedia.org/resource/Canon_EOS-1D_2">
<myCamera:owned_by rdf:resource="http://www.liyangyu.com/people#Tom"/>
<myCamera:cost rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>expensive</myCamera:cost>
</myCamera:DSLR>

The claimed inference:

<http://dbpedia.org/resource/Canon_EOS-1D_2> rdf:type  <http://www.liyangyu.com/camera#ExpensiveDSLR_2>

Again checking this with Pellet et all: no inference at all.

This is because the condition in the model is as follows:

If then
instance of http://www.liyangyu.com/camera#ExpensiveDSLR_2 property cost has value "expensive".


Not the way around.

If you change the model to

<owl:Class rdf:about="http://www.liyangyu.com/camera#ExpensiveDSLR_3">
<rdfs:subClassOf rdf:resource="http://www.liyangyu.com/camera#DSLR"/>
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource="http://www.liyangyu.com/camera#cost"/>
<owl:hasValue rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>expensive</owl:hasValue>
</owl:Restriction>
</owl:equivalentClass>
</owl:Class>

then you get
If then
instance of http://www.liyangyu.com/camera#ExpensiveDSLR_3 property cost has value "expensive".
AND
property cost has value "expensive" instance of http://www.liyangyu.com/camera#ExpensiveDSLR_3

Conclusion

I lost my confidence in the book completely, the first time this happens with a book coming from Springer.
I suggest money is better spent on the second edition of the "Semantic Web for the Working Ontologist" of Dean Allemang and Jim Hendler, being much, much better edited than the first edition (Missed Opportunity).

Working ontologist

And based on the slides I have seen from the 2011 Semantic Technology Conference Tutorial, "Seven Things You Didn't Know About OWL" given by Dave McComb and Simon Robe, I would like to see a "how to ontology" book from them soon.





0 comments

Installing and populating triple stores Part 3: Virtuoso

I'm on Windows 7 32-bit and I'm using the commercial OpenLink Virtuoso Universal Server, Single Server Edition (Desktop/Workstation Operating System) which you still can get at the special price of $499 at http://virtuoso.openlinksw.com/pricing/ which is a bargain for such a swiss army knife as Virtuoso Universal Server.
The RDF file I want to upload has size 168.871 KB.

Installing This commercial version comes with an installer.
The installation is exhaustively described in the Installation Guide
and should complete without any trouble.
However there seems to be an incompatibility between windows 7 (32-bit)
and the included PHP related dll which was solved by commenting out the last line in the virtuoso.ini file.
;Load10=Hosting, hosting_php.dll
After installation, you need to run the Virtuoso Service Manager and start the Virtuoso server.
Your server can be accessed then from http://localhost:8890/
Populating The manual describes 13 !!! different methods to insert RDF into Virtuoso.
Didn't test them all, but not all seem to work with the 168.871 KB file I used.
So I describe only what worked for me.

What worked for me

for files on the filesystem

  1. using following API function from the ISQL application
    SQL> DB.DBA.RDF_LOAD_RDFXML_MT 
    (file_to_string_output ('./erfgoedexport.rdf'),
    '', 'http://www.proxml.be/erfgoedexport');

    where 'http:/ /www.proxml.be/erfgoedexport' is our graph identifier.

    If you leave the to be imported file in a directory on your filesystem which is not included in the DirsAllowed Parameter of your virtuoso.ini file then you get a "an access denied error due to access control".
    Since '.' is included in the allowed dirs, we did place our rdf file in the same directory as the virtuoso.db hence './erfgoedexport.rdf'

  2. using the RDF Store Upload Tab from Virtuoso Conductor

    works, but we did get a few times 'You have attempted to upload invalid data', probably due to the size of the file.

for files on a http server

  1. using SPARQL

    Following the instructions coming from the Simple Virtuoso Installation & Utilization Guide for SPARQL Users


    DEFINE GET:SOFT "replace"
    SELECT DISTINCT *
    FROM
    WHERE {?s ?p ?o.}

    initially returned
    42000 Error SR186: 
    No permission to execute procedure DB.DBA.RDF_SPONGE_UP.

    To make this work one needs to make sure that the SPARQL user has sufficient rights including SPARQL_SPONGE.
  2. using following API function from the ISQL application
    SQL> SQL> DB.DBA.RDF_LOAD_RDFXML_MT 
    (http_get ('http://www.proxml.be/test/erfgoedexport.rdf'),
    '', 'http://www.proxml.be/erfgoedexport');
    where 'http:/ /www.proxml.be/erfgoedexport' is our graph identifier.

After installation

Virtuoso Service Manager


Virtuoso Service Manager

Virtuoso Server


Virtuoso Server starting
Virtuoso Web Server

Virtuoso Web Server
Loading RDF

Loading from the filesystem via API

Loading Virtuoso via API
Loading from the filesystem via the Conductor web interface

RDF upload via Conductor
Setting the SPARQL User rights for being able to load RDF from a http address

SPARQL User rights
Loading from a http server via API

Loading Virtuoso via API (2)


2 comments

Europeana and Linked Data

This is a write-out of the short presentation I did give during Datasalon 6 at the BOZAR in Brussels during which I expressed my concerns related to the Linked Data-ness of the upcoming next release of Europeana.

Europeana is a multi-lingual online collection of millions of digitized items from European museums, libraries, archives and multi-media collections.

Europeana ESE

It's first iteration was build using the ESE (Europeana Semantic Elements) data model.
It's a list of metadata elements being the lowest common denominator of the different data standards used in the cultural heritage sector formalized in an XML schema.
A content supplier was asked to convert its data according to this schema and put them on a OAI-PMH server where Europeana could harvest them.


ESE architecture

A clean and powerful aggregation architecture.
Nothing wrong with that, except for the fact that the used data model is 'too' simple.

Now enter the Linked Data world

In a 'perfect' linked data world every museum, library, archive, cultural heritage organisation should publish its data according to the Linked Data principles:
  • being free to use whatever vocabularies which suit best their intended semantics and use case
  • trying to link as much as possible to the datasets of its colleagues and to available authority lists such as VIAF, LCSH, ...

Everyone can crawl then the datasets of interest,

  • build an ontology to enable:
    • matching and merging
    • infering new properties and relationships,
  • and subsequently develop interesting applications.

Linked Data
I'm not saying this is simple and doesn't come with issues, but it offers the most open approach and possibilities for integration.

Europeana EDM

Europeana is working on a second generation positioning itself within this linked data and semantic web movement.
The XML ESE model will be replaced by a much, much richer data model EDM (Europeana Data Model) to be formalized with semantic web standards (RDFS/OWL).

But I wondered what this will mean for the content provider in practice?

Replacing ESE by EDM?

Will the content provider need to replace ESE by EDM data, maybe OAI-PMH to be replaced by another submission mechanism?


EDM?
Even this will not be necessary since ESE will continue to be accepted.

EDM as the integrating ontology for data on the Linked Data Web?

Or will EDM be one of the merging, matching ontologies for all the different cultural heritage data flowers blossoming on the Linked Data Web?

EDM - Linked Data
I'm afraid it will be closer to the former.
I know this isn't on purpose.
According to the Europeana Data Model Primer, suppliers can use more specialized vocabularies as long as they provide the mapping to EDM.
Two remarks:
  • This approach doesn't expect the data to be available on the Linked Data Web; only to be submitted to Europeana.
  • It contradicts the requests we get: give me the cheapest, easiest way to get on Europeana.
I have nothing against Europeana; it deserves all our support.
The problem I have is that it takes interest, resources and budget away from the real Linked Data Web.

What Europeana should do to promote the Linked Data Web

To embrace fully the Linked Data Web Europeana should:
  • stimulate the cultural heritage organisations to publish their data in their richest form on the Linked Data web
  • facilitate the linking between the datasets
  • propose and develop authority lists for agents, events, places, concepts, ...
  • work together with LOD2 to apply the most performing integration methodologies and technologies.
I do find it a little bit ackward that Europe is at the same time sponsoring the development of the best data integration technology for the highly decentralized web (LOD2), and a concrete project for data integration (Europeana) which is very centralized.

DISCLAIMER: We earn of course our money by doing Linked Data implementations, but also by doing all types of data conversions and putting OAI-PMH into place. Only, the first one is much more future oriented and a lot more fun.
It was by getting so many ESE related RFP's during the last months that made me start worrying in the first place.

The full presentation:
Download file "datasalon.pdf"


1 comment

Installing, creating and populating triple stores Part 2: Joseki

Joseki/TDB


Installing Installing of Joseki is easy; just follow the instructions at http://www.joseki.org/start.html.
Creating and populating the TDB triple store I took a shortcut here using Topbraid Composer ME,
which has an export facility to TDB and saves the database as a folder to the file system with extension '.tdb.data' (see screenshot).
Connecting Joseki with TDB This is the hard stuff.
The trick lies in adapting the Joseki configuration file ('joseki-config.ttl').
I added an additional service:
# Service 3 - SPARQL processor handling a TDB dataset
<#service3>
rdf:type joseki:Service ;
rdfs:label "SPARQL on TDB" ;
joseki:serviceRef "TDB" ; # web.xml must route this name to Joseki
# dataset part
joseki:dataset <#newdata> ;
# Service part.
joseki:processor joseki:ProcessorSPARQL_FixedDS .

and an additional corresponding dataset:
## Initialize TDB.
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#newdata> rdf:type tdb:DatasetTDB ;
rdfs:label "A TDB dataset" ;
# location where the tdb.data folder exists
tdb:location "C:/Users/Paul/MyWorkSpaces/TBCMEWorkspace/Test/erfgoedplustdb.tdb.data" .

One also needs to adapt the 'web.xml' file in the webapps/joseki/WEB-INF folder by adding:
 <servlet-mapping>
<servlet-name>SPARQL service processor</servlet-name>
<url-pattern>/TDB</url-pattern>
</servlet-mapping>

Then I made a specific HTML page that contains following form where the action is sent to the TDB service:
<form action="TDB" method="get">
<p>SELECT - get variables (apply XSLT stylesheet)</p>
<p><textarea name="query" cols="70" rows="5">
PREFIX pcce: <http://www.pcce.be/egb/>
SELECT ?a
WHERE
{ ?a a pcce:Germeente.}</textarea>
<br/>

Output XML: <input type="radio" name="output" value="xml" checked/>
with XSLT style sheet (leave blank for none):
<input name="stylesheet" size="25" value="/xml-to-html.xsl" /> <br/>
or JSON output: <br/>
or text output: <br/>
or CSV output: <br/>
or TSV output: <br/>
Force the accept header to <tt>text/plain</tt> regardless
<input type="checkbox" name="force-accept" value="text/plain"/>
<br/>

<input type="submit" value="Get Results" />
</p>
</form>
Overall This took some considerable time to figure out.

Starting Joseki


starting Joseki

Testing Joseki


Joseki running

Exporting from TBC ME


TBC ME export to TDB

Specific Query page to the TDB service


own query screen

Joseki/BigOWLIM


installing Same as above; just follow the instructions at http://www.joseki.org/start.html.
creating and populating a BigOWLIM triple store I used the Sesame console application, using the BigOWLIM repository template file 'bigowlim.ttl' to be copied in the templates directory of the OpenRDF Sesame console folder.
On Windows 7 this is C:\Users\xxx\Appdata\Roaming\Aduna\OpenRDF Sesame console\templates.
See the BigOWLIM Quick Start Guide (PDF).
Populating the repository can then be done using the Sesame web app.
connecting Joseki with the BigOWLIM triple store Similar actions as above.
Editing the joseki configuration file.
a) adding a service
# Service 4 - SPARQL processor handling a BigOWLIM store 
<#service4>
rdf:type joseki:Service ;
rdfs:label "SPARQL on BigOWLIM" ;
joseki:dataset otjena:bridge ;
joseki:serviceRef "BigOWLIM" ;
joseki:processor joseki:ProcessorSPARQL_FixedDS .
b) adding a dataset
## Initialize BigOWLIM
[] ja:loadClass "com.ontotext.jena.SesameVocab" .
otjena:DatasetSesame rdfs:subClassOf ja:RDFDataset .
otjena:bridge rdf:type otjena:DatasetSesame ;
rdfs:label "BigOWLIM repository" ;
## BEWARE path to storage-folder not mentioned in manual
<http://www.ontotext.com/trree/owlim#storage-folder> "owlimTest-storage";
otjena:datasetParam "C:/Windows/System32/config/systemprofile/AppData/Roaming/Aduna/OpenRDF Sesame/repositories/bigowlimTest" .
Editing then web.xml file by adding.
<servlet-mapping>
<servlet-name>SPARQL service processor</servlet-name>
<url-pattern>/BigOWLIM</url-pattern>
</servlet-mapping>
And brewing a HTML file that posts a SPARQL query to the BigOWLIM service (same as above).



0 comments

Installing, creating and populating triple stores Part 1: Sesame

Sesame


Installing is straight forward;
copying two war files in a Java servlet container as described in the manual at
http://www.openrdf.org/doc/sesame2/2.3.2/users/userguide.html#chapter-server-install
brings you up and running in no time.
Creating a store created the test triple store using the openrdf-workbench web app ('New repository' under 'Repositories').
BEWARE:With my testfile of 168 Mb using the 'In Memory Store' did lead to a Java Heap Space issue; did choose a 'Native Java Store' instead.
Populating the store using the 'Add' directive under 'Modify' of the open-rdf workbench.
Overall very easy to set up with a very good web management interface.

creating a new repository with the workbenchSesame create store


loading the RDF fileSesame populate store

Sesame with BigOWLIM


installing the Sesame part is identical as above.
for the BigOWLIM part; the BigOWLIM Quick Start Guide (PDF) has a very good step by step procedure.
BEWARE: if you want additional functionality such as geo queries, you best copy all the jars from the bigowlim-3.4\ext folder to the openrdf-sesame\WEB-INF\lib folder instead of only the 4 jars mentioned in the guide.
creating a triple store this cannot be done from the sesame openrdf-workbench.
You need to use the Sesame console application, using the BigOWLIM repository template file 'bigowlim.ttl' to be copied in the templates directory of the OpenRDF Sesame console folder.
On Windows 7 this is C:\Users\xxx\Appdata\Roaming\Aduna\OpenRDF Sesame console\templates.
This is again well described in the same Quick Start Guide.
populating the triple store once the repository is created one can populate it using the openrdf workbench (same as above).
overall a little bit more tricky, but once the store is created, you have the same user-friendly environment as with plain Sesame.

Using the Sesame console app to create a triple store


Sesame console
BEWARE: do not forget to close your console instructions with a '.'.

0 comments

SPARQL 1.1 aggregates support UPDATE

Context


We have been looking for a web framework that, by talking to a triple store, offers
faceted and set-based navigation in addition to full text and fielded search similar to the features of Siderean Seamark Navigator which is not being developed anymore.

After a market investigation we came to the conclusion that semsol's Paggr Prospect came closest.

Paggr Prospect though needs to be able to talk to a SPARQL endpoint that implements SPARQL 1.1 aggregation functions.

An example of such a query:
SELECT DISTINCT ?name (count( ?person) as ?total )
WHERE {
?person kennedys:gender ?gender.
?gender rdfs:label ?name.
}
GROUP BY ?name
ORDER BY desc(?total)

which gives following result:

SPARQL count result

semsol's own ARC offers those, but we wanted to evaluate other options since we are also interested in being able to do geo queries.

Testing


Our test file is a 168.871 kB RDF/XML file containing 1.292.253 triples.

We tested following triple stores/sparql endpoints:
Product RDF file loaded support documented SPARQL aggregate working
Sesame/native store yes yes since version Sesame 2.4.0 yes
Sesame/BigOWLIM yes yes since version 4.0 yes
Joseki/ARQ/TDB yes yes yes
Joseki/ARQ/bigOWLIM yes yes yes
Virtuoso yes yes yes
4store yes not found gives wrong result
Allegrograph yes not found gives syntax error
Talis Platform yes yes yes

Sesame does support SPARQL 1.1 aggregates.

Sesame SPARQL 1.1 Query
With following result:

Sesame result query

Joseki/ARQ does and with a TDB and BigOWLIM backend:

Joseki Query

Joseki Result
ARQ with its 1.1 capabilities is also used within TopBraid Live which offers also a SPARQL endpoint.

Virtuoso does also:

Virtuoso
4store is working on support but doesn't return the expected result


4store querywith result

4store result

Allegrograph doesn't have support for aggregates yet, but is working on it and should appear in an upcoming release.





4store
The Talis Platform (SaaS) could not handle the large upload, but I succeeded to populate the store by uploading smaller files.
I used following XProc script for doing this:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
name="myPipeline"
version="1.0">
<p:output
port="result"
sequence="true" />
    <!-- path to the folder with rdf files -->
<p:variable
name="path"
select="'/Users/paul/Desktop/RandD/owlimimport/erfgoedplusImport/'">
<p:empty />
</p:variable>
<p:directory-list
include-filter=".*\.rdf">
<p:with-option
name="path"
select="$path">
<p:empty />
</p:with-option>
</p:directory-list>
<p:for-each
name="directoryloop">
<p:output
port="result"
sequence="true" />
<p:iteration-source
select="/c:directory/c:file" />
<p:variable
name="file"
select="concat($path,/c:file/@name)" />
<p:load
name="file">
<p:with-option
name="href"
select="$file" />
</p:load>
<p:insert
match="/c:request/c:body"
position="first-child">
<p:input
port="source">
<p:inline>
<c:request
href="http://api.talis.com/stores/zzzzzzzzz/meta"
method="POST"
detailed="true" auth-method="digest" username="xxxxxxx" password="yyyyyyy">
<c:body
content-type="application/rdf+xml" />
</c:request>
</p:inline>
</p:input>
<p:input
port="insertion">
<p:pipe
port="result"
step="file" />
</p:input>
</p:insert>
<p:http-request
name="request"/>
<p:identity />
</p:for-each>
</p:declare-step>
The query in the Talis platform:

Talis SPARQL aggregate query
with this result:

Talis query result

Conclusion

Since my previous posting, support for SPARQL 1.1 aggregates became more prominent in the marketplace, which is good to see happening.


6 comments

SPARQL to remember

Filtering untyped literals

After a job integrating triples from different sources i ended up with the situation that multiple subjects had twice the same value for the same property but one value was a plain literal, while the other was of datatype xsd:string.

<x>  pcce:city "Brussels"
<x> pcce:city "Brussels"^^xsd:string

My aim was to get rid of one of the two.

My initial idea was to make use of the SPARQL "datatype" operator, which returns the datatype IRI of the value.
My first SPARQL UPDATE query looked as follows:

DELETE {?a pcce:city ?city.}
WHERE {
?a a pcce:Building.
?a pcce:city ?city.
FILTER (datatype(?city) != xsd:string)
}

To my big surprise this didn't work at all.

I was kindly refered to the SPARQL spec by Scott Henninger of TopQuadrant, where is clearly indicated that if the parameter of the datatype operator is a simple literal, xsd:string is returned.

The built-in datatype operand in SPARQL casts untyped literals to xsd:string, so in both the cases xsd:string is returned making it impossible to check untyped literals this way.

Work-around

There is a SPARQL operator (e.g. "sameTerm") that allows you to test if two RDF terms are the same. In our case it concerns testing two RDF literal values for equality.

How is Literal Equality defined?


Two literals are equal if and only if all of the following hold:
  • The strings of the two lexical forms compare equal, character by character.
  • Either both or neither have language tags.
  • The language tags, if any, compare equal.
  • Either both or neither have datatype URIs.
  • The two datatype URIs, if any, compare equal, character by character.
According to this definition "Brussels" and "Brussels"^^xsd:string are different terms.

What to compare?

Second piece of the solution is that we have the ability to cast primitive datatypes to other simple datatypes according to the rules of XPath.

If we take "Brussels" and cast it using the xsd:string constructor function to "Brussels"^^xsd:string and then compare both terms with sameTerm we will get FALSE.
If we take "Brussels"^^xsd:string and cast it using xsd:string to "Brussels"^^xsd:string then the comparison will give TRUE.

Solution

This leads to following solution:

DELETE {?a pcce:city ?city.}
WHERE {
?a a pcce:Building.
?a pcce:city ?city.
FILTER (sameTerm(?city,xsd:string(?city)))
}

Users of products from the TopBraid Suite family have access to a shortcut function for testing untyped literals: "spl:isUntypedLiteral".

Our query becomes then:

DELETE {?a pcce:city ?city.}
WHERE {
?a a pcce:Building.
?a pcce:city ?city.
FILTER (!spl:isUntypedLiteral(?city))
}

0 comments

RDFa - slides

These are the slides used during the SAI evening event of 23th of November in Antwerp on RDFa:
Download file "RDFa.pdf"

0 comments

SPARQL for SKOS integrity constraints

I blogged before on how to test SKOS integrity constraints, constraints which couldn't be expressed with OWL2 as defined by W3C.
I'm aware of two solutions that cover those constraints:

Both use SPARQL to check the constraints; the difference being that within the SPIN environment one needs to write the SPARQL queries themselves,
while in the Pellet IC environment OWL axioms (considered under the closed world assumption) are translated to SPARQL in the background.

Anyhow, since I'm still eager to learn more and better SPARQL, I was curious how they compared.
This table summarizes what I found out.

SKOS Constraint SPIN SPARQL OWL2IC SPARQL (1)
S13
skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties
# Constraint S13a: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:prefLabel ?label .
?this skos:altLabel ?label .
}
# Constraint S13b: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:prefLabel ?label .
?this skos:hiddenLabel ?label .
}
# Constraint S13c: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:hiddenLabel ?label .
?this skos:altLabel ?label .
}
Validating constraint: disjointProperties prefLabel altLabel hiddenLabel

SELECT ?x0
WHERE
{ ?x0 skos:altLabel ?x1 ;
skos:prefLabel ?x1 .
}

SELECT ?x0
WHERE
{ ?x0 skos:prefLabel ?x1 ;
skos:hiddenLabel ?x1 .
}

GENERATED FROM OWL axiom

[] a owl:AllDisjointProperties ;
owl:members (skos:prefLabel skos:altLabel skos:hiddenLabel) .
S14
A resource has no more than one value of skos:prefLabel per language tag
# Constraint S14: a resource has no more than one value of skos:prefLabel per language tag.
ASK WHERE {
?this skos:prefLabel ?label1 .
?this skos:prefLabel ?label2 .
LET (?label1lang := lang(?label1)) .
LET (?label2lang := lang(?label2)) .
FILTER ((?label1lang = ?label2lang) && (?label1 != ?label2)) .
}
# This condition cannot be encoded as a OWL integrity constraint directly.
S27
skos:related is disjoint with the property skos:broaderTransitive
# Constraint S27: skos:related is disjoint with the property skos:broaderTransitive.
ASK WHERE {
?this skos:related ?object1 .
?this skos:broaderTransitive ?object2 .
FILTER (?object1 = ?object2) .
}
Validating constraint: related disjointPropertyWith broaderTransitive

SELECT ?x0
WHERE
{ ?x1 owl:bottomObjectProperty ?x0 ;
skos:related ?x0 .
}

GENERATED FROM OWL axiom

skos:related owl:propertyDisjointWith skos:broaderTransitive .
S46
skos:exactMatch owl:propertyDisjointWith skos:broadMatch , skos:relatedMatch
# Constraint S46: skos:exactMatch is disjoint with each of the properties skos:broadMatch and skos:relatedMatch.
ASK WHERE {
?this skos:exactMatch ?exactMatch .
OPTIONAL {
?this skos:broadMatch ?broadMatch .
} .
OPTIONAL {
?this skos:relatedMatch ?relatedMatch .
} .
FILTER ((?exactMatch = ?broadMatch) || (?exactMatch = ?relatedMatch)) .
}
Validating constraint: exactMatch disjointPropertyWith relatedMatch
SELECT ?x0
WHERE
{ ?x1 skos:relatedMatch ?x0 ;
skos:exactMatch ?x0 .
}

SELECT ?x0
WHERE
{ ?x1 skos:exactMatch ?x0 ;
skos:broadMatch ?x0 .
}

GENERATED FROM OWL axiom

skos:exactMatch owl:propertyDisjointWith skos:broadMatch , skos:relatedMatch .

(1) In the case of Pellet ICV I've taken the SPARQL queries as they appeared on stout while using the --verbose argument on the command line.

If you know of other approaches, please let me know.

2 comments

Some Gridworks tips

Much of the data that’s lying around is a mess. So we need badly tools that help us to clean up this mess.
Freebase Gridworks of Metaweb, recently acquired by Google is such a tool that allows us to:
  • Merge similar names using multiple methods:
    • Automatic title-casing
    • using an expression language (GEL)
    • using several clustering algorithms to detect similarities
  • Split multi-valued cells over columns and rows
  • Create new colums based on content of other columns
  • ...

Make sure you see the videos at http://code.google.com/p/freebase-gridworks/

Some tricks I want to remember for myself:

  • how to use regular expressions to correct cell values
  • how to fill in a 'null' column with a value.

Correcting cell values using regular expressions

I have some phone numbers in an existing dataset which should be formatted according to following structure:
+ followed by the country code: e.g. +32
followed by the area code with the 0 between parentheses: e.g. (0)15
followed by the local code following this pattern X?XX XX XX: e.g. 23 45 67
Full example: +32 (0)15 23 45 67

The existing dataset contains slightly different phone numbers; shown as loaded into Gridworks.

phone numbers
Using regular expressions we can split the existing numbers in 2 groups using parentheses to indicate the groups:

  • the characters before the area code
    start-of-line, followed by the '+' character, followed by 2 digits, followed by a space
    (^\+\d{2}\s)
  • everything starting from the area code
    1 or 2 digits followed by a space, followed by multiple digits, followed by a space, followed by 2 digits followed by a space (2 times), followed by end-of-line
    (\d{1,2}\s\d+\s\d{2}\s\d{2}$)

The regular expression as shown in the RX Toolkit of Komodo IDE.

regex in Komodo IDE

Now using these groups to replace the existing values with a value conforming the wished structure using the replacement expression
group 1 followed by '(0) followed by group 2.

\1(0)\2


Replacement using groups

Now that we have our regex working, let's move on to Gridworks now.

On the column containing the telephone numbers, choose Edit cells, Transform ...

Gridworks Transform cell
Now we can use the Gridworks expression language (GEL) to do our transform.

GEL offers a whole list of functions; we will be using 'replace'. 'Replace' takes 3 arguments:

  • the input string
  • the replacement string which can be a regex
  • the returned string after replacement which can contain the captured groups defined in the regex.
It took us a few minutes to discover what the precise syntax was when using regex's. This is the template to be used:

replace(value,//,'')

where value refers to the value in the cell
where // delimits the regex
and '' contains the replacement string using the captured groups being indicated with '$', e.g. $1, $2.

In our case the expression became:

replace(value,/(^\+\d{2}\s)(\d{1,2}\s\d+\s\d{2}\s\d{2}$)/,'$1(0)$2')


Result of regex replacement

Filling in a column with a fixed value

From a spreadsheet containing addresses of musea I have a column with 'null' values which I want to use to indicate the type of the entity e.g. 'museum'.

Empty column
I've done this using the GEL 'forNonBlank' function. 'forNonBlank' takes 4 parameters:
forNonBlank(e, v, eNonBlank, eBlank)
  • an expression to be evaluated
  • captured in a variable
  • when not null or empty string, evaluate eNonBlank
  • when null or empty string, evaluate eBlank

In our case

forNonBlank(value, v, 'not relevant', 'museum')

setting column

Conclusion

For everyone pursuing data quality Gridworks should become a central component of his/her toolset.
I'll try to investigate how NeedleBase compares.



0 comments

Moving to another triple store supporting SPARQL 1.1

Update: See similar post of Ric Roberts: Installing Jena, Joseki and TDB on OS X or Linux

Problem statement


We have been using in a project Sesame as rdf triple store and SPARQL endpoint.
The reasons we choose Sesame were:
  • it is open source and free
  • it is very easy to set up
  • it has a very friendly user interface
  • there is a built-in connector in Topbraid Composer ME; the chosen SW IDE
  • and it is a national ('dutch') product.
We want now to expose the dataset utilizing faceted navigation aids as offered by Paggr Prospect, the faceted browser builder for Linked Data.

Prospect in action on Crunchbase
For being able to do so Prospect however must be connected to a SPARQL endpoint that offers aggregate functions as being defined in SPARQL 1.1.

An example using the aggregate function COUNT:

SELECT COUNT(?person) AS ?alices
WHERE {
?person :name "Alice" .
}
SPARQL 1.1 however is not (yet?) supported in Sesame.

Joseki seemed a viable replacement:
  • it is open source and free
  • it is build upon the very robust Jena/ARQ RDF framework
  • there is a built-in connector in Topbraid Composer ME, which uses the same Jena/ARQ RDF framework
  • and offers support for SPARQL 1.1
As persistency layer we choose TDB.

Migration


The migration went much more smoothly than expected. This is what we did.
  1. Create from the Sesame store a TDB store
  2. install Joseki
    1. set environmental variables JOSEKIROOT and the JAVA CLASSPATH
  3. Configure Joseki
    1. adapt config.ttl to add the TDB store
    2. adapt the web.xml file
    3. make a HTML form for querying the SPARQL endpoint

Create a TDB store


We used for this one of the built-in facilities of Topbraid Composer. You can direcly export your existing dataset to a TDB database.Export to TDB

Install Joseki

  1. Unzip the distribution.
  2. Set the JOSEKIROOT environment variable to the location of the installation.

    JOSEKIROOT
  3. Make sure your Java classpath points to %JOSEKIROOT%/lib

    setting the classpath
  4. Test the installation by running from the command-line.

    starting Joseki
    Important: do not try to run the rdfserver bat file from the bin directory; run it directly from %JOSEKIROOT%.

    Point your web server now to http://localhost:2020/ and you should get this screen:

    The JOSEKI startup screen

Configure Joseki

  1. Add a service and a dataset to the joseki-config.ttl file (a service with name 'TDB' using dataset 'newdata').
    The joseki-config.ttl is found in %JOSEKIROOT%.
     # Service 3 - SPARQL processor only handling a given dataset
    <#service3>
    rdf:type joseki:Service ;
    rdfs:label "SPARQL on TDB" ;
    joseki:serviceRef "TDB" ;
    # web.xml must route this name to Joseki
    # dataset part
    joseki:dataset <#newdata> ;
    # Service part.
    # This processor will not allow either the protocol,
    # nor the query, to specify the dataset.
    joseki:processor joseki:ProcessorSPARQL_FixedDS ;

    and the settings for the 'newdata' dataset.

     ## Initialize TDB.
    [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
    tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

    <#newdata> rdf:type tdb:DatasetTDB ;
    rdfs:label "A new TDB dataset" ;
    tdb:location "C:/Users/Paul/MyWorkSpaces/TBCMEWorkspace/Test/kennedys.tdb.data" .
  2. add 'TDB' to the web.xml file which resides in %JOSEKIROOT\webapps\joseki\WEB-INF

    <servlet-mapping>
    <servlet-name>SPARQL service processor</servlet-name>
    <url-pattern>/TDB</url-pattern>
    </servlet-mapping>
  3. make a HTML file (in my case myQuery.html containing a form to post queries to the TDB SPARQL service and copy it next to the other html files in %JOSEKIROOT%\webapps\joseki.

    <form action="TDB" method="get">
    <p>SELECT - get variables (apply XSLT stylesheet)</p>
    <p><textarea name="query" cols="70" rows="5">
    PREFIX kennedys: &lt;http://topbraid.org/examples/kennedys#>
    SELECT ?a ?c
    WHERE
    { ?a kennedys:name ?c}</textarea>
    <br/>
    Output XML: <input type="radio" name="output" value="xml" checked/>
    with XSLT style sheet (leave blank for none):
    <input name="stylesheet" size="25" value="/xml-to-html.xsl" /> <br/>
    or JSON output: <input type="radio" name="output" value="json"/> <br/>
    or text output: <input type="radio" name="output" value="text"/> <br/>
    or CSV output: <input type="radio" name="output" value="csv"/> <br/>
    or TSV output: <input type="radio" name="output" value="tsv"/> <br/>
    Force the accept header to <tt>text/plain</tt> regardless
    <input type="checkbox" name="force-accept" value="text/plain"/>
    <br/>

    <input type="submit" value="Get Results" />
    </p>
    </form>

Result

Going to http://localhost:2020/myQuery.html gives

Querying the TDB store
The query shown uses SPARQL 1.1 aggregates and returns as result:

SPARQL 1.1 query results
Our SPARQL 1.1 endpoint up and running in 2 hours, migration included.
Next step is building the faceted browser interface. So more is to come.



0 comments

Where OWL fails, another OWL arises

What happened previously


In the SKOS spec you find a list of Class & Property Definitions and Integrity Conditions numbered from S1 to S62.
Most of these definitions and constraints are already covered by the existing SKOS OWL1 ontologies (available in Full and DL versions).
However some definitions and conditions could not be expressed in OWL1.

This is the list:
  • S13: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties .
  • S14: A resource has no more than one value of skos:prefLabel per language tag .
  • S27: skos:related is disjoint with the property skos:broaderTransitive.
  • S46 : skos:exactMatch is disjoint with each of the properties skos:broadMatch and skos:relatedMatch .
  • S55: The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel.
  • S56: The property chain (skosxl:altLabel, skosxl:literalForm) is a sub-property of skos:altLabel.
  • S57: The property chain (skosxl:hiddenLabel, skosxl:literalForm) is a sub-property of skos:hiddenLabel.
  • S58: skosxl:prefLabel, skosxl:altLabel and skosxl:hiddenLabel are pairwise disjoint properties .
Hence the question if these could be expressed in OWL2 as defined by W3C.

I tried to use OWL2 for cases
In these 4 cases my conclusion was very hard: that OWL2 as defined by W3C was pretty useless for the case.
I tried to implement the same skos conditions using SPIN/SPARQL and that went fairly easy and painless.

This conclusion attracted some interest and motivated Holger Knublauch of TopQuadrant to write a blog entry with the title "WHERE OWL fails".

His main points were:
  • OWL is hard-coded against specific design patterns, and anything that goes beyond those patterns cannot be expressed.
  • The choice of supported design patterns is heavily influenced by the theory of Description Logics being theoretically sound but the 'usefulness in practice' questionable.
This triggered a response of Kendall Clark of C&P heavily disagreeing.

Since I'm a happy user of both TopBraid Composer of TopQuadrant and Pellet of C&P, I would like to add my point(s) of view.

Before we continue I must make clear that in this discussion we are talking about 3 different things:
  • SPIN, the constraint and construct language developed by Holger Knublauch and implemented in the TBC suite of tools, but also available in open source
  • OWL2 with an open world assumption as described by W3C and criticized by Holger
  • OWL2 with a closed world assumption as implemented in Pellet ICV made by Clark & Parsia, from now on referred to as OWL2IC.

Do they disagree?

Let's start with constraint checking.

From a distance I think they both more agree than Kendall is willing to admit.

They both agree that:
  • in a lot of use cases the Open World Assumption of OWL2 is counter-intuitive and counter-productive (Why integrity constraints?)
  • checking closed world constraints is best being done using SPARQL
They only differ in opinion on how the constraint checking SPARQL queries need to be generated.
  • In the case of SPIN, you need to write the SPARQL yourself.
  • In the case of OWL2IC, you write OWL axioms which are transformed then in the background to SPARQL queries.
I do not have a firm opinion on this one.
Jeni Tennison reacted on Twitter that you have a similar situation in the XML world:
XML Schema/RELAX NG vs Schematron: declarative versus rule-based constraints and both are considered useful.
Although there is a difference with the SPIN - OWL2IC divide: XML SChema and RelaxingNG do not generate Schematron in the background.
For people that want to do that you can use the XSD2Schematron converter from Rick Jelliffe, but I digress.


Some additional points to take into consideration:
  • The writing of SPARQL queries is not that obvious. This is also one of the rationales behind the development of the Linked Data API. cf. slide
    So indeed, if well done, a declarative approach can make writing constraints easier for certain types of users.
    On the other hand is my experience and on the XML and RDF side that some constraints can only be expressed using rules; so I would say that this approach is more powerful, as being proved also in the context of S14.
    Easier versus powerful or maybe when we grow up, we get a combined solution such as in the upcoming XSD 1.1. (see also the suggestion of Evren Sirin in the context of S14.)
  • SPIN is more than a constraint checking language, as being proved by the solution for the property chain inference which could not be expressed by OWL2.
But I would say: let the market decide which approach is preferred. There is surely a need to have something to validate RDF. Time for the W3C to step in?

Do they disagree?

Of course, they disagree on the importance of OWL2.

For TopQuadrant OWL2 is just one and optional piece within SW applications.
"But this makes OWL just one out of a catalog of vocabularies, on the same level as SKOS or FOAF or SIOC or GoodRelations. "
The flagship product of Clark&Parsia is Pellet, the leading OWL2 DL reasoner; so it's obvious where they stand.
And I can imagine one gets nervous when OWL2 is not accompanied by drumrolls.
However Holger's opinion is according to me well balanced.
I agree on Holger's point 1: OWL 2 (DL) implies indeed some constraints on what you can model, which doesn't always fit what you need and want. This was the whole point of my series.
Concerning point 2: I also experienced myself more than once that OWL DL reasoning can take some time, but I hope and expect that clever engineering will make things better in the future.

But if you need and want OWL2, you need and want Pellet.
One main and sufficient reason: the explain feature which helps me solve the bugs in my head.

Let the software speak

The good point of this whole discussion was that this triggered Clark & Parsia to ameliorate their OWLIC implementation in Pellet ICV of which a new version 0.4 was released yesterday.
An overview of my test results is given in the table below.
Details can be found in updated versions of the respective blog entries S14, S13, S27, S55.
SKOS Constraint SPIN OWL2 OWL2IC
S14 Y + - -
S13 Y + - +
S27 Y + - +
S55 N + - -

OWL-API and SKOS

I take the opportunity to discuss some SKOS related issues I encounter with every software that's built upon the OWL API, being the Protégé's 4, the Neon Toolkit.

Importing the skos-xl ontology at http://www.w3.org/2008/05/skos-xl in your own ontology.
This skos-xl ontology contains itself another owl:imports statement of the core skos ontology:
<owl:imports rdf:resource="http://www.w3.org/2004/02/skos/core"/>
This indirect import gets lost during import.
The result of this loss is that due to some OWL2 magic the property skos:definition becomes an Object Property while it has been defined originally as an Annotation Property. ???
You can correct this loss by doing a direct import of the skos core ontology . This direct import works, but doesn't correct the issue mentioned above. Furthermore this import cannot be saved. ???

Also all annotation properties get doubled as datatype properties. The explanation given today on the Protégé list:
"Actually this happens in part because it appears that the skos ontology is inconsistent with the OWL 2 specifications. There it says that
If an ontology has an ontology IRI but no version IRI, then a different ontology with the same ontology IRI but no version IRI should not exist.
However skos has two distinct ontologies with the same name."

Conclusion
I'm afraid that one of the more popular SW vocabularies being SKOS and OWL2 do live in separate worlds and that hence you better don't use tools that are too hardwired to OWL2.






0 comments

SKOS (part 4): property chains

We consider following SKOS property definitions which were not expressible in OWL1:
  • S55: The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel.
  • S56: The property chain (skosxl:altLabel, skosxl:literalForm) is a sub-property of skos:altLabel.
  • S57: The property chain (skosxl:hiddenLabel, skosxl:literalForm) is a sub-property of skos:hiddenLabel.

OWL2


One of the new and exciting features of OWL2 is the facility that a property can be defined as the composition of several properties; called a property chain.
The traditional example here is that the property :hasUncle is a composition of the properties :hasParent and :hasBrother.
You will find an elaborate explanation of this example at the Semantic Web Programming site.

More formally we have the following axiom:
SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) OPE ).
This axiom states that, if an individual x is connected with an individual y by a sequence of object property expressions OPE1, ..., OPEn ,
then x is also connected with y by the object property expression OPE.
Such axioms are also known as complex role inclusions [SROIQ].

Now let's move to SKOS where:
  • skosxl:prefLabel is an Object property.
  • skosxl:literalForm is a Data property.
  • skos:prefLabel is an Annotation property.
This mixture of property types is, as far as I understand OWL2, not allowed in property chain axioms.

But let's try anyhow.

skos:prefLabel rdf:type owl:AnnotationProperty.
xl:prefLabel rdf:type owl:ObjectProperty.
xl:literalForm rdf:type owl:DatatypeProperty.
skos:prefLabel owl:propertyChainAxiom (
xl:prefLabel
xl:literalForm
).
Pellet 2.1.0 throws as expected a warning:
WARNING: Unsupported axiom: Bnode in owl:propertyChainAxiom axiom is not a valid
property expression.
Conclusion: this type of property chaining cannot be done in OWL2.

SPIN


In SPIN once again implementing this constraint is fairly easy.

Adding following SPARQL CONSTRUCT as a spin:rule to e.g. the skos:Concept class does the work.

CONSTRUCT {
?this skos:prefLabel ?label .
}
WHERE {
?this xl:prefLabel ?prefLabel .
?prefLabel xl:literalForm ?label .
}
The result as shown in TopBraid Composer:

Property Chain

Conclusion


SPIN wins again.

0 comments

Integrity Constraints in SKOS (part 3)

Today's constraint is

S27 skos:related is disjoint with the property skos:broaderTransitive.

OWL2

While OWL 1 provided means to state the disjointness of classes, it was impossible to state that properties are disjoint.
OWL2 changes this game.
OWL2 allows to assert that several object properties are pairwise incompatible (exclusive); that is, two individuals cannot be connected by two different properties of the set. The same for data properties.

For skos constraint S27 this would translate into following code:

<rdf:Property rdf:about="http://www.w3.org/2004/02/skos/core#related">
<rdfs:comment xml:lang="en">skos:related is disjoint with skos:broaderTransitive</rdfs:comment>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#SymmetricProperty"/>
<rdfs:subPropertyOf rdf:resource="http://www.w3.org/2004/02/skos/core#semanticRelation"/>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
<skos:definition xml:lang="en">Relates a concept to a concept with which there is an associative semantic relationship.</skos:definition>
<rdfs:isDefinedBy rdf:resource="http://www.w3.org/2004/02/skos/core"/>
<rdfs:label xml:lang="en">has related</rdfs:label>
<owl:propertyDisjointWith rdf:resource="http://www.w3.org/2004/02/skos/core#broaderTransitive"/>
</rdf:Property>

OWL2 DL

OWL2 DL does put however, for the reasons of decidability, some restrictions on the use of the DisjointObjectProperties axiom. The properties used need to be simple, meaning very roughly that axioms of following form cannot be involved (directly or indirectly):

  • SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) OPE ) with n > 1, or
  • SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) INV(OPE) ) with n > 1, or
  • TransitiveObjectProperty( OPE ), or
  • TransitiveObjectProperty( INV(OPE) )

Pellet 2.0

Indeed when using Pellet 2.0.2, we get following warning:

WARNING: Unsupported axiom: 
Ignoring transitivity and/or complex subproperty axioms for broaderTransitive
31-mrt-2010 14:09:54 org.mindswap.pellet.RBox ignoreTransitivity

SPIN

Formulating the constraint in SPIN is straightforward.

ASK WHERE {
?this skos:related ?object1 .
?this skos:broaderTransitive ?object2 .
FILTER (?object1 = ?object2) .
}

Throwing errors as indicated in TopBraid Composer with following example:

error thrown in Topbraid Composer

Pellet ICV 0.4

Using Pellet ICV 0.4 with following constraint::
<rdf:Description rdf:about="http://www.w3.org/2004/02/skos/core#related">
<owl:propertyDisjointWith rdf:resource="http://www.w3.org/2004/02/skos/core#broaderTransitive"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.w3.org/2004/02/skos/core#broaderTransitive">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#TransitiveProperty"/>
</rdf:Description>

reports correctly the constraint violation:

c:\Program Files\pellet-2.1.0>pellet-ic --constraints C:\Users\Paul\TBCMEWorkspa
ce\test\s27c.rdf C:\Users\Paul\TBCMEWorkspace\Test\s27.rdf
15-apr-2010 12:43:41 org.mindswap.pellet.jena.graph.loader.DefaultGraphLoader ad
dUnsupportedFeature
WARNING: Unsupported axiom: Ignoring transitivity axiom due to an existing disjo
intness axioms for property broaderTransitive
15-apr-2010 12:43:41 org.mindswap.pellet.RBox ignoreTransitivity
WARNING: Unsupported axiom: Ignoring transitivity and/or complex subproperty axi
oms for broaderTransitive
Validating 2 integrity constraints
Will stop after 1 constraint violation(s) are found

Validating constraint: related disjointPropertyWith broaderTransitive
Constraint violated : Yes
Violating individuals (1): Concept_1,

Number of constraint(s) violated: 1

FYI: the SPARQL Query generated by Pellet ICV from the OWL Axiom above is:

SELECT  ?x0
WHERE
{ ?x1 skos:related ?x0 ;
skos:broaderTransitive ?x0 .
}

Conclusion

Once again fairly easy to do with SPIN;
a long study of the particularities of OWL2 DL restrictions to find out that this constraint cannot be expressed in OWL2 DL,
but OWL IC using the closed world assumption does the job also with 1 line of code.

0 comments

HTTP in XProc (updated again)

UPDATE: Norm Walsh, the editor of the XProc spec and the developer of Calabash commented that PUT and DELETE are supported by XProc. See at the end of the article.
Apple Leopard server's Wiki and Blog software throws away links to anchors in the same page, so you need to scroll down by yourself.

Normally when I need to figure out how some RESTlike webservices are working, I take a lazy approach and just fire up some Firefox extension such as RestClient or Poster.

A request in Poster

Poster Request
A response in Poster

Poster Response
And a view on RestClient

RestClient

Based on a recent blog post of Norman Walsh on Wiki editing using XProc, I decided to do some experimenting with the http facilities of XProc.

POSTING raw XML to a service checking XML character encoding


I have on following address http://www.proxml.be:8080/check/encoding/ a service running,
which allows any XML binary representation to be POSTed to it and performing a best effort to determine its encoding.
The client to the service must issue an HTTP POST request with a body containing the XML resource to be tested.
The posted XML needs to have an XML declaration with encoding specified.
A report on the determined encoding will be returned as a UTF-8 encoded XML response.
The documentation of this service can be found at http://www.proxml.be:1060/book/view/book:urn:be:proxml:apps:xml:encoding:validator/

This service is running on the latest version of Netkernel, is open-sourced and can be downloaded from http://resources.1060research.com/packages/2010/1/proxml-check-encoding-1.1.0.nkp.jar.

A simple XProc doing this posting with the XML inline:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:http-request omit-xml-declaration="false" encoding="UTF-8">
<p:input port="source">
<p:inline>
<c:request href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body content-type="application/xml" >
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>
</p:inline>
</p:input>
</p:http-request>
</p:declare-step>

I think the code is rather straightforward.
In XProc an HTTP request is represented by a c:request element.

c.request
And the body needs to look like:

In our case:
<c:request href="http://www.proxml.be:8080/check/encoding/" method="POST">
<c:body content-type="application/xml" >
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>
Remark: the request only works for this webservice if the omit-xml-declaration attribute has been explicitly set to 'false', since the posted XML needs to contain the xml declaration to be checked.

An alternative where the xml has been put in a separate file.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:identity name="id">
<p:input port="source">
<p:document href="input.xml"/>
</p:input>
</p:identity>
<p:insert name="ins" match="/c:request/c:body" position="first-child">
<p:input port="source">
<p:inline>
<c:request href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body content-type="application/xml"/>
</c:request>
</p:inline>
</p:input>
<p:input port="insertion">
<p:pipe port="result" step="id"/>
</p:input>
</p:insert>
<p:http-request name="request" omit-xml-declaration="false" encoding="UTF-8"/>
</p:declare-step>

First an external xml file is read and then the insert step is used to insert this xml as first-child into the c:body element, see the XSLTMatch pattern on the match attribute.

The result of this insert step is:

<c:request
xmlns:c="http://www.w3.org/ns/xproc-step"
href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body
xmlns:c="http://www.w3.org/ns/xproc-step"
content-type="application/xml">
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>

POSTING XML as a x-www-form-urlencoded parameter


Important: The url used in this example is a fake one.

We need to construct a request that looks like:
<c:request method="POST" href="http://www.example.com/form-action" 
xmlns:c="http://www.w3.org/ns/xproc-step">
<c:body content-type="application/x-www-form-urlencoded">
name=W3C&amp;spec=XProc
</c:body>
</c:request>
where the body contains name=value pairs separated by an &amp;

The p:www-form-urlencode step encodes a set of parameter values as a x-www-form-urlencoded string and injects it into the source document at the XSLTMatch pattern of the match attribute; in our case at /c:request/c:body/text(). If a string '@@HERE@@' is not placed inside the c:body the code will not work.

Since our xml needs to be placed in an attribute value on the c:param element, it needs to be escaped; hence our preliminary escape-markup step. The p:escape-markup step applies XML serialization to the children of the document element and replaces those children with their serialization. This is the reason we use a wrapper element (wrap) around the xml top be escaped.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="main"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:html="http://www.w3.org/1999/xhtml" version="1.0">
<p:output port="result"/>
<p:escape-markup name="escape">
<p:input port="source">
<p:inline>
<wrap><a>Hello: Ăąƥʥϖℬ≝</a>
</wrap>
</p:inline>
</p:input>
</p:escape-markup>
<p:www-form-urlencode match="/c:request/c:body/text()">
<p:input port="source">
<p:inline>
<c:request method="POST"
href="http://www.testservice.org/">
<c:body content-type="application/x-www-form-urlencoded">@@HERE@@</c:body>
</c:request>
</p:inline>
</p:input>
<p:with-param name="uid" select="'test'"/>
<p:with-param name="pwd" select="'test'"/>
<p:with-param name="xml" select=".">
<p:pipe port="result" step="escape"/>
</p:with-param>
</p:www-form-urlencode>
<p:http-request/>
</p:declare-step>

The resulting request:
<?xml version="1.0" encoding="UTF-8"?>
<c:request
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:html="http://www.w3.org/1999/xhtml"
method="POST"
href="http://www.testservice.org">
    <c:body
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:html="http://www.w3.org/1999/xhtml"
content-type="application/x-www-form-urlencoded">uid=test&amp;pwd=test&amp;xml=%3Ca%3EHello:%20%C3%84%E2%80%9A%C3%84%E2%80%A6%C3%86%C2%A5%C3%8A%C2%A5%C3%8F%E2%80%93%C3%A2%E2%80%9E%C2%AC%C3%A2%E2%80%B0%EF%BF%BD%3C/a%3E%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20</c:body>
</c:request>

Alternative 2

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="main"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:html="http://www.w3.org/1999/xhtml" version="1.0">
<p:output port="result"/>
<p:escape-markup name="escape">
<p:input port="source">
<p:inline>
<wrap><a>Hello: Ăąƥʥϖℬ≝</a>
</wrap>
</p:inline>
</p:input>
</p:escape-markup>
<p:add-attribute match="c:param" name="xml">
<p:input port="source">
<p:inline>
<c:param name="xml"/>
</p:inline>
</p:input>
<p:with-option name="attribute-name" select="'value'"/>
<p:with-option name="attribute-value" select="wrap/text()">
<p:pipe port="result" step="escape"/>
</p:with-option>
</p:add-attribute>
<p:wrap-sequence name="wrap" wrapper="c:param-set">
<p:input port="source">
<p:inline>
<wrap>
<c:param name="uid" value="test"/>
<c:param name="pwd" value="test"/>
</wrap>
</p:inline>
<p:pipe port="result" step="xml"/>
</p:input>
</p:wrap-sequence>
<p:unwrap match="wrap" name="unwrap"/>
<p:www-form-urlencode match="/c:request/c:body/text()">
<p:input port="source">
<p:inline>
<c:request method="POST"
href="http://www.testservice.org/">
<c:body content-type="application/x-www-form-urlencoded">@@HERE@@</c:body>
</c:request>
</p:inline>
</p:input>
<p:input port="parameters">
<p:pipe port="result" step="unwrap"/>
</p:input>
</p:www-form-urlencode>
<p:http-request/>
</p:declare-step>
All of the pipelines have been tested and with the latest versions of Calumet and Calabash both run from inside OxygenXML.

XProc in OxygenXML

Conclusion


Is this a good replacement for my usual approach?
One is tempted to say yes, but I'm afraid not completely,
since two HTTP methods which are used heavily in a REST environments, being PUT and DELETE are not supported by XProc (yet?).

UPDATE

Norm Walsh, the editor of the XProc spec and the developer of Calabash commented that PUT and DELETE are supported by XProc.

This is good news. I have been put on the wrong leg by following sentence from the spec.

The method attribute specifies the method to be used against the IRI specified by the href attribute, e.g. GET or POST (the value is not case-sensitive).

Using following XProc:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:http-request omit-xml-declaration="false" encoding="UTF-8">
<p:input port="source">
<p:inline>
<c:request href="http://localhost:8888/exist/rest/db/fruits/Almonds.xml" method="PUT">
<c:body content-type="application/xml" >
<product>
<category>fruits</category>
<item>Almonds</item>
<inventory>
<sku>AlmofruiIV75Lm</sku>
<price>2</price>
<inventory>915</inventory>
</inventory>
<vendor>TriCounty Produce</vendor>
</product>
</c:body>
</c:request>
</p:inline>
</p:input>
</p:http-request>
</p:declare-step>
I get following response from Calumet from EMC.
SystemID: C:\Users\Paul\OxygenWorkspace\XProc\simplePUT.xpl
Engine name: Calumet XProc
Severity: error
Description: XPROC_ERROR: Unsupported request method: PUT
Original message: XPROC_ERROR: Unsupported request method: PUT

UPDATE: Good news coming our way: HTTP PUT will be supported in the upcoming 1.0.11 release.

And following response from Calabash version 0.9.20 from Norm Walsh
<c:body content-type="application/octet-stream" encoding="base64">
</c:body>
And the PUT has been carried out in eXist-db.
<product xmlns:c="http://www.w3.org/ns/xproc-step">
<category>fruits</category>
<item>Almonds</item>
<inventory>
<sku>AlmofruiIV75Lm</sku>
<price>2</price>
<inventory>915</inventory>
</inventory>
<vendor>TriCounty Produce</vendor>
</product>
Does anyone have a hint how to get rid of the namespace declaration 'http://www.w3.org/ns/xproc-step'

in the PUTTED file?

Yes, credits go to Vojtech Toman of EMC.

Changing

<p:inline>

to

<p:inline exclude-inline-prefixes="c">

does the trick.

2 comments