SPARQL for SKOS integrity constraints

I blogged before on how to test SKOS integrity constraints, constraints which couldn't be expressed with OWL2 as defined by W3C.
I'm aware of two solutions that cover those constraints:

Both use SPARQL to check the constraints; the difference being that within the SPIN environment one needs to write the SPARQL queries themselves,
while in the Pellet IC environment OWL axioms (considered under the closed world assumption) are translated to SPARQL in the background.

Anyhow, since I'm still eager to learn more and better SPARQL, I was curious how they compared.
This table summarizes what I found out.

SKOS Constraint SPIN SPARQL OWL2IC SPARQL (1)
S13
skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties
# Constraint S13a: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:prefLabel ?label .
?this skos:altLabel ?label .
}
# Constraint S13b: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:prefLabel ?label .
?this skos:hiddenLabel ?label .
}
# Constraint S13c: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.
ASK WHERE {
?this skos:hiddenLabel ?label .
?this skos:altLabel ?label .
}
Validating constraint: disjointProperties prefLabel altLabel hiddenLabel

SELECT ?x0
WHERE
{ ?x0 skos:altLabel ?x1 ;
skos:prefLabel ?x1 .
}

SELECT ?x0
WHERE
{ ?x0 skos:prefLabel ?x1 ;
skos:hiddenLabel ?x1 .
}

GENERATED FROM OWL axiom

[] a owl:AllDisjointProperties ;
owl:members (skos:prefLabel skos:altLabel skos:hiddenLabel) .
S14
A resource has no more than one value of skos:prefLabel per language tag
# Constraint S14: a resource has no more than one value of skos:prefLabel per language tag.
ASK WHERE {
?this skos:prefLabel ?label1 .
?this skos:prefLabel ?label2 .
LET (?label1lang := lang(?label1)) .
LET (?label2lang := lang(?label2)) .
FILTER ((?label1lang = ?label2lang) && (?label1 != ?label2)) .
}
# This condition cannot be encoded as a OWL integrity constraint directly.
S27
skos:related is disjoint with the property skos:broaderTransitive
# Constraint S27: skos:related is disjoint with the property skos:broaderTransitive.
ASK WHERE {
?this skos:related ?object1 .
?this skos:broaderTransitive ?object2 .
FILTER (?object1 = ?object2) .
}
Validating constraint: related disjointPropertyWith broaderTransitive

SELECT ?x0
WHERE
{ ?x1 owl:bottomObjectProperty ?x0 ;
skos:related ?x0 .
}

GENERATED FROM OWL axiom

skos:related owl:propertyDisjointWith skos:broaderTransitive .
S46
skos:exactMatch owl:propertyDisjointWith skos:broadMatch , skos:relatedMatch
# Constraint S46: skos:exactMatch is disjoint with each of the properties skos:broadMatch and skos:relatedMatch.
ASK WHERE {
?this skos:exactMatch ?exactMatch .
OPTIONAL {
?this skos:broadMatch ?broadMatch .
} .
OPTIONAL {
?this skos:relatedMatch ?relatedMatch .
} .
FILTER ((?exactMatch = ?broadMatch) || (?exactMatch = ?relatedMatch)) .
}
Validating constraint: exactMatch disjointPropertyWith relatedMatch
SELECT ?x0
WHERE
{ ?x1 skos:relatedMatch ?x0 ;
skos:exactMatch ?x0 .
}

SELECT ?x0
WHERE
{ ?x1 skos:exactMatch ?x0 ;
skos:broadMatch ?x0 .
}

GENERATED FROM OWL axiom

skos:exactMatch owl:propertyDisjointWith skos:broadMatch , skos:relatedMatch .

(1) In the case of Pellet ICV I've taken the SPARQL queries as they appeared on stout while using the --verbose argument on the command line.

If you know of other approaches, please let me know.

0 comments

SKOS and the OWL API (again)

I'm having trouble to understand what the OWL API is doing with SKOS.

This is my case.
I start a new ontology and I import the SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL) schema to be found at <http://www.w3.org/2008/05/skos-xl>

What I get in a Jena based IDE


After having imported SKOS-XL, the system also imports SKOS itself since this is a indicated by an owl:imports statement in SKOS-XL.
<owl:imports rdf:resource="http://www.w3.org/2004/02/skos/core"/>

Furthermore the IDE asks to add missing imports for resources from other namespaces it encounters and which are untyped.

missing imports

Result of all this, I end up with 3 imports:

3 imports

If I look now at the available classes and properties, this all being compliant with what I read in the respective specs.

ClassesProperties

And when I add an individual skos:Concept, I do get access to all the properties I expect.

Jena individual

What I get in an OWL API based IDE

I get only one import, SKOS-XL. SKOS is not imported, since it is a built-in feature of the OWL API (dixit Matthew Horridge).

So I get in these environments following classes and properties.

OWL2 API properties
And adding an individual Concept leads to following input form

OWL API instance

which doesn't offer the same list as is the case with Jena based IDE and surely isn't, as is, an environment I want to use for SKOS work.

Question

I do not see the feature. Anyone who is able to explain this?

0 comments

Some Gridworks tips

Much of the data that’s lying around is a mess. So we need badly tools that help us to clean up this mess.
Freebase Gridworks of Metaweb, recently acquired by Google is such a tool that allows us to:
  • Merge similar names using multiple methods:
    • Automatic title-casing
    • using an expression language (GEL)
    • using several clustering algorithms to detect similarities
  • Split multi-valued cells over columns and rows
  • Create new colums based on content of other columns
  • ...

Make sure you see the videos at http://code.google.com/p/freebase-gridworks/

Some tricks I want to remember for myself:

  • how to use regular expressions to correct cell values
  • how to fill in a 'null' column with a value.

Correcting cell values using regular expressions

I have some phone numbers in an existing dataset which should be formatted according to following structure:
+ followed by the country code: e.g. +32
followed by the area code with the 0 between parentheses: e.g. (0)15
followed by the local code following this pattern X?XX XX XX: e.g. 23 45 67
Full example: +32 (0)15 23 45 67

The existing dataset contains slightly different phone numbers; shown as loaded into Gridworks.

phone numbers
Using regular expressions we can split the existing numbers in 2 groups using parentheses to indicate the groups:

  • the characters before the area code
    start-of-line, followed by the '+' character, followed by 2 digits, followed by a space
    (^\+\d{2}\s)
  • everything starting from the area code
    1 or 2 digits followed by a space, followed by multiple digits, followed by a space, followed by 2 digits followed by a space (2 times), followed by end-of-line
    (\d{1,2}\s\d+\s\d{2}\s\d{2}$)

The regular expression as shown in the RX Toolkit of Komodo IDE.

regex in Komodo IDE

Now using these groups to replace the existing values with a value conforming the wished structure using the replacement expression
group 1 followed by '(0) followed by group 2.

\1(0)\2


Replacement using groups

Now that we have our regex working, let's move on to Gridworks now.

On the column containing the telephone numbers, choose Edit cells, Transform ...

Gridworks Transform cell
Now we can use the Gridworks expression language (GEL) to do our transform.

GEL offers a whole list of functions; we will be using 'replace'. 'Replace' takes 3 arguments:

  • the input string
  • the replacement string which can be a regex
  • the returned string after replacement which can contain the captured groups defined in the regex.
It took us a few minutes to discover what the precise syntax was when using regex's. This is the template to be used:

replace(value,//,'')

where value refers to the value in the cell
where // delimits the regex
and '' contains the replacement string using the captured groups being indicated with '$', e.g. $1, $2.

In our case the expression became:

replace(value,/(^\+\d{2}\s)(\d{1,2}\s\d+\s\d{2}\s\d{2}$)/,'$1(0)$2')


Result of regex replacement

Filling in a column with a fixed value

From a spreadsheet containing addresses of musea I have a column with 'null' values which I want to use to indicate the type of the entity e.g. 'museum'.

Empty column
I've done this using the GEL 'forNonBlank' function. 'forNonBlank' takes 4 parameters:
forNonBlank(e, v, eNonBlank, eBlank)
  • an expression to be evaluated
  • captured in a variable
  • when not null or empty string, evaluate eNonBlank
  • when null or empty string, evaluate eBlank

In our case

forNonBlank(value, v, 'not relevant', 'museum')

setting column

Conclusion

For everyone pursuing data quality Gridworks should become a central component of his/her toolset.
I'll try to investigate how NeedleBase compares.



0 comments

Moving to another triple store supporting SPARQL 1.1

Problem statement


We have been using in a project Sesame as rdf triple store and SPARQL endpoint.
The reasons we choose Sesame were:
  • it is open source and free
  • it is very easy to set up
  • it has a very friendly user interface
  • there is a built-in connector in Topbraid Composer ME; the chosen SW IDE
  • and it is a national ('dutch') product.
We want now to expose the dataset utilizing faceted navigation aids as offered by Paggr Prospect, the faceted browser builder for Linked Data.

Prospect in action on Crunchbase
For being able to do so Prospect however must be connected to a SPARQL endpoint that offers aggregate functions as being defined in SPARQL 1.1.

An example using the aggregate function COUNT:

SELECT COUNT(?person) AS ?alices
WHERE {
?person :name "Alice" .
}
SPARQL 1.1 however is not (yet?) supported in Sesame.

Joseki seemed a viable replacement:
  • it is open source and free
  • it is build upon the very robust Jena/ARQ RDF framework
  • there is a built-in connector in Topbraid Composer ME, which uses the same Jena/ARQ RDF framework
  • and offers support for SPARQL 1.1
As persistency layer we choose TDB.

Migration


The migration went much more smoothly than expected. This is what we did.
  1. Create from the Sesame store a TDB store
  2. install Joseki
    1. set environmental variables JOSEKIROOT and the JAVA CLASSPATH
  3. Configure Joseki
    1. adapt config.ttl to add the TDB store
    2. adapt the web.xml file
    3. make a HTML form for querying the SPARQL endpoint

Create a TDB store


We used for this one of the built-in facilities of Topbraid Composer. You can direcly export your existing dataset to a TDB database.Export to TDB

Install Joseki

  1. Unzip the distribution.
  2. Set the JOSEKIROOT environment variable to the location of the installation.

    JOSEKIROOT
  3. Make sure your Java classpath points to %JOSEKIROOT%/lib

    setting the classpath
  4. Test the installation by running from the command-line.

    starting Joseki
    Important: do not try to run the rdfserver bat file from the bin directory; run it directly from %JOSEKIROOT%.

    Point your web server now to http://localhost:2020/ and you should get this screen:

    The JOSEKI startup screen

Configure Joseki

  1. Add a service and a dataset to the joseki-config.ttl file (a service with name 'TDB' using dataset 'newdata').
    The joseki-config.ttl is found in %JOSEKIROOT%.
     # Service 3 - SPARQL processor only handling a given dataset
    <#service3>
    rdf:type joseki:Service ;
    rdfs:label "SPARQL on TDB" ;
    joseki:serviceRef "TDB" ;
    # web.xml must route this name to Joseki
    # dataset part
    joseki:dataset <#newdata> ;
    # Service part.
    # This processor will not allow either the protocol,
    # nor the query, to specify the dataset.
    joseki:processor joseki:ProcessorSPARQL_FixedDS ;

    and the settings for the 'newdata' dataset.

     ## Initialize TDB.
    [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
    tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

    <#newdata> rdf:type tdb:DatasetTDB ;
    rdfs:label "A new TDB dataset" ;
    tdb:location "C:/Users/Paul/MyWorkSpaces/TBCMEWorkspace/Test/kennedys.tdb.data" .
  2. add 'TDB' to the web.xml file which resides in %JOSEKIROOT\webapps\joseki\WEB-INF

    <servlet-mapping>
    <servlet-name>SPARQL service processor</servlet-name>
    <url-pattern>/TDB</url-pattern>
    </servlet-mapping>
  3. make a HTML file (in my case myQuery.html containing a form to post queries to the TDB SPARQL service and copy it next to the other html files in %JOSEKIROOT%\webapps\joseki.

    <form action="TDB" method="get">
    <p>SELECT - get variables (apply XSLT stylesheet)</p>
    <p><textarea name="query" cols="70" rows="5">
    PREFIX kennedys: &lt;http://topbraid.org/examples/kennedys#>
    SELECT ?a ?c
    WHERE
    { ?a kennedys:name ?c}</textarea>
    <br/>
    Output XML: <input type="radio" name="output" value="xml" checked/>
    with XSLT style sheet (leave blank for none):
    <input name="stylesheet" size="25" value="/xml-to-html.xsl" /> <br/>
    or JSON output: <input type="radio" name="output" value="json"/> <br/>
    or text output: <input type="radio" name="output" value="text"/> <br/>
    or CSV output: <input type="radio" name="output" value="csv"/> <br/>
    or TSV output: <input type="radio" name="output" value="tsv"/> <br/>
    Force the accept header to <tt>text/plain</tt> regardless
    <input type="checkbox" name="force-accept" value="text/plain"/>
    <br/>

    <input type="submit" value="Get Results" />
    </p>
    </form>

Result

Going to http://localhost:2020/myQuery.html gives

Querying the TDB store
The query shown uses SPARQL 1.1 aggregates and returns as result:

SPARQL 1.1 query results
Our SPARQL 1.1 endpoint up and running in 2 hours, migration included.
Next step is building the faceted browser interface. So more is to come.



0 comments

SKOS and the OWL API

As a follow up to the preceding entry on SKOS and OWL2 describing some difficulties in using the SKOS ontologies (DL and Full) in tools based on the OWL-API.

Timothy Redmond <tredmond@stanford.edu> on the Protégé mailing list

I will have to go to the OWL api mailing list to learn more about this.  
The OWL api does not appear to view skos as a proper OWL ontology -
it seems to view skos as a vocabulary (e.g. like http://www.w3.org/1999/02/22-rdf-syntax-ns,
http://www.w3.org/2000/01/rdf-schema and http://www.w3.org/2002/07/owl).
In particular the OWL api has explicit code that prevents importing skos core.
In addition it treats rdf statements involving skos predicates in a special way.
This is why some classes ended up being individuals and thus certain properties ended up
being data type properties when you did the OWL 1 dl import.
I will see what I can find out.
and

Matthew Horridge on the OWL API bug tracker

This is a feature of the OWL API. The SKOS vocabulary is built into the
OWL API (like the OWL or RDF vocabulary). Imports of the RDF, RDFS, OWL
and SKOS "schema files" are ignored during parsing.

Conclusion: For the time being, use for your SKOS work a tool based on the Jena framework.



0 comments

Where OWL fails, another OWL arises

What happened previously


In the SKOS spec you find a list of Class & Property Definitions and Integrity Conditions numbered from S1 to S62.
Most of these definitions and constraints are already covered by the existing SKOS OWL1 ontologies (available in Full and DL versions).
However some definitions and conditions could not be expressed in OWL1.

This is the list:
  • S13: skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties .
  • S14: A resource has no more than one value of skos:prefLabel per language tag .
  • S27: skos:related is disjoint with the property skos:broaderTransitive.
  • S46 : skos:exactMatch is disjoint with each of the properties skos:broadMatch and skos:relatedMatch .
  • S55: The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel.
  • S56: The property chain (skosxl:altLabel, skosxl:literalForm) is a sub-property of skos:altLabel.
  • S57: The property chain (skosxl:hiddenLabel, skosxl:literalForm) is a sub-property of skos:hiddenLabel.
  • S58: skosxl:prefLabel, skosxl:altLabel and skosxl:hiddenLabel are pairwise disjoint properties .
Hence the question if these could be expressed in OWL2 as defined by W3C.

I tried to use OWL2 for cases
In these 4 cases my conclusion was very hard: that OWL2 as defined by W3C was pretty useless for the case.
I tried to implement the same skos conditions using SPIN/SPARQL and that went fairly easy and painless.

This conclusion attracted some interest and motivated Holger Knublauch of TopQuadrant to write a blog entry with the title "WHERE OWL fails".

His main points were:
  • OWL is hard-coded against specific design patterns, and anything that goes beyond those patterns cannot be expressed.
  • The choice of supported design patterns is heavily influenced by the theory of Description Logics being theoretically sound but the 'usefulness in practice' questionable.
This triggered a response of Kendall Clark of C&P heavily disagreeing.

Since I'm a happy user of both TopBraid Composer of TopQuadrant and Pellet of C&P, I would like to add my point(s) of view.

Before we continue I must make clear that in this discussion we are talking about 3 different things:
  • SPIN, the constraint and construct language developed by Holger Knublauch and implemented in the TBC suite of tools, but also available in open source
  • OWL2 with an open world assumption as described by W3C and criticized by Holger
  • OWL2 with a closed world assumption as implemented in Pellet ICV made by Clark & Parsia, from now on referred to as OWL2IC.

Do they disagree?

Let's start with constraint checking.

From a distance I think they both more agree than Kendall is willing to admit.

They both agree that:
  • in a lot of use cases the Open World Assumption of OWL2 is counter-intuitive and counter-productive (Why integrity constraints?)
  • checking closed world constraints is best being done using SPARQL
They only differ in opinion on how the constraint checking SPARQL queries need to be generated.
  • In the case of SPIN, you need to write the SPARQL yourself.
  • In the case of OWL2IC, you write OWL axioms which are transformed then in the background to SPARQL queries.
I do not have a firm opinion on this one.
Jeni Tennison reacted on Twitter that you have a similar situation in the XML world:
XML Schema/RELAX NG vs Schematron: declarative versus rule-based constraints and both are considered useful.
Although there is a difference with the SPIN - OWL2IC divide: XML SChema and RelaxingNG do not generate Schematron in the background.
For people that want to do that you can use the XSD2Schematron converter from Rick Jelliffe, but I digress.


Some additional points to take into consideration:
  • The writing of SPARQL queries is not that obvious. This is also one of the rationales behind the development of the Linked Data API. cf. slide
    So indeed, if well done, a declarative approach can make writing constraints easier for certain types of users.
    On the other hand is my experience and on the XML and RDF side that some constraints can only be expressed using rules; so I would say that this approach is more powerful, as being proved also in the context of S14.
    Easier versus powerful or maybe when we grow up, we get a combined solution such as in the upcoming XSD 1.1. (see also the suggestion of Evren Sirin in the context of S14.)
  • SPIN is more than a constraint checking language, as being proved by the solution for the property chain inference which could not be expressed by OWL2.
But I would say: let the market decide which approach is preferred. There is surely a need to have something to validate RDF. Time for the W3C to step in?

Do they disagree?

Of course, they disagree on the importance of OWL2.

For TopQuadrant OWL2 is just one and optional piece within SW applications.
"But this makes OWL just one out of a catalog of vocabularies, on the same level as SKOS or FOAF or SIOC or GoodRelations. "
The flagship product of Clark&Parsia is Pellet, the leading OWL2 DL reasoner; so it's obvious where they stand.
And I can imagine one gets nervous when OWL2 is not accompanied by drumrolls.
However Holger's opinion is according to me well balanced.
I agree on Holger's point 1: OWL 2 (DL) implies indeed some constraints on what you can model, which doesn't always fit what you need and want. This was the whole point of my series.
Concerning point 2: I also experienced myself more than once that OWL DL reasoning can take some time, but I hope and expect that clever engineering will make things better in the future.

But if you need and want OWL2, you need and want Pellet.
One main and sufficient reason: the explain feature which helps me solve the bugs in my head.

Let the software speak

The good point of this whole discussion was that this triggered Clark & Parsia to ameliorate their OWLIC implementation in Pellet ICV of which a new version 0.4 was released yesterday.
An overview of my test results is given in the table below.
Details can be found in updated versions of the respective blog entries S14, S13, S27, S55.
SKOS Constraint SPIN OWL2 OWL2IC
S14 Y + - -
S13 Y + - +
S27 Y + - +
S55 N + - -

OWL-API and SKOS

I take the opportunity to discuss some SKOS related issues I encounter with every software that's built upon the OWL API, being the Protégé's 4, the Neon Toolkit.

Importing the skos-xl ontology at http://www.w3.org/2008/05/skos-xl in your own ontology.
This skos-xl ontology contains itself another owl:imports statement of the core skos ontology:
<owl:imports rdf:resource="http://www.w3.org/2004/02/skos/core"/>
This indirect import gets lost during import.
The result of this loss is that due to some OWL2 magic the property skos:definition becomes an Object Property while it has been defined originally as an Annotation Property. ???
You can correct this loss by doing a direct import of the skos core ontology . This direct import works, but doesn't correct the issue mentioned above. Furthermore this import cannot be saved. ???

Also all annotation properties get doubled as datatype properties. The explanation given today on the Protégé list:
"Actually this happens in part because it appears that the skos ontology is inconsistent with the OWL 2 specifications. There it says that
If an ontology has an ontology IRI but no version IRI, then a different ontology with the same ontology IRI but no version IRI should not exist.
However skos has two distinct ontologies with the same name."

Conclusion
I'm afraid that one of the more popular SW vocabularies being SKOS and OWL2 do live in separate worlds and that hence you better don't use tools that are too hardwired to OWL2.






0 comments

SKOS (part 4): property chains

We consider following SKOS property definitions which were not expressible in OWL1:
  • S55: The property chain (skosxl:prefLabel, skosxl:literalForm) is a sub-property of skos:prefLabel.
  • S56: The property chain (skosxl:altLabel, skosxl:literalForm) is a sub-property of skos:altLabel.
  • S57: The property chain (skosxl:hiddenLabel, skosxl:literalForm) is a sub-property of skos:hiddenLabel.

OWL2


One of the new and exciting features of OWL2 is the facility that a property can be defined as the composition of several properties; called a property chain.
The traditional example here is that the property :hasUncle is a composition of the properties :hasParent and :hasBrother.
You will find an elaborate explanation of this example at the Semantic Web Programming site.

More formally we have the following axiom:
SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) OPE ).
This axiom states that, if an individual x is connected with an individual y by a sequence of object property expressions OPE1, ..., OPEn ,
then x is also connected with y by the object property expression OPE.
Such axioms are also known as complex role inclusions [SROIQ].

Now let's move to SKOS where:
  • skosxl:prefLabel is an Object property.
  • skosxl:literalForm is a Data property.
  • skos:prefLabel is an Annotation property.
This mixture of property types is, as far as I understand OWL2, not allowed in property chain axioms.

But let's try anyhow.

skos:prefLabel rdf:type owl:AnnotationProperty.
xl:prefLabel rdf:type owl:ObjectProperty.
xl:literalForm rdf:type owl:DatatypeProperty.
skos:prefLabel owl:propertyChainAxiom (
xl:prefLabel
xl:literalForm
).
Pellet 2.1.0 throws as expected a warning:
WARNING: Unsupported axiom: Bnode in owl:propertyChainAxiom axiom is not a valid
property expression.
Conclusion: this type of property chaining cannot be done in OWL2.

SPIN


In SPIN once again implementing this constraint is fairly easy.

Adding following SPARQL CONSTRUCT as a spin:rule to e.g. the skos:Concept class does the work.

CONSTRUCT {
?this skos:prefLabel ?label .
}
WHERE {
?this xl:prefLabel ?prefLabel .
?prefLabel xl:literalForm ?label .
}
The result as shown in TopBraid Composer:

Property Chain

Conclusion


SPIN wins again.

0 comments

Integrity Constraints in SKOS (part 3)

Today's constraint is

S27 skos:related is disjoint with the property skos:broaderTransitive.

OWL2

While OWL 1 provided means to state the disjointness of classes, it was impossible to state that properties are disjoint.
OWL2 changes this game.
OWL2 allows to assert that several object properties are pairwise incompatible (exclusive); that is, two individuals cannot be connected by two different properties of the set. The same for data properties.

For skos constraint S27 this would translate into following code:

<rdf:Property rdf:about="http://www.w3.org/2004/02/skos/core#related">
<rdfs:comment xml:lang="en">skos:related is disjoint with skos:broaderTransitive</rdfs:comment>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#SymmetricProperty"/>
<rdfs:subPropertyOf rdf:resource="http://www.w3.org/2004/02/skos/core#semanticRelation"/>
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
<skos:definition xml:lang="en">Relates a concept to a concept with which there is an associative semantic relationship.</skos:definition>
<rdfs:isDefinedBy rdf:resource="http://www.w3.org/2004/02/skos/core"/>
<rdfs:label xml:lang="en">has related</rdfs:label>
<owl:propertyDisjointWith rdf:resource="http://www.w3.org/2004/02/skos/core#broaderTransitive"/>
</rdf:Property>

OWL2 DL

OWL2 DL does put however, for the reasons of decidability, some restrictions on the use of the DisjointObjectProperties axiom. The properties used need to be simple, meaning very roughly that axioms of following form cannot be involved (directly or indirectly):

  • SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) OPE ) with n > 1, or
  • SubObjectPropertyOf( ObjectPropertyChain( OPE1 ... OPEn ) INV(OPE) ) with n > 1, or
  • TransitiveObjectProperty( OPE ), or
  • TransitiveObjectProperty( INV(OPE) )

Pellet 2.0

Indeed when using Pellet 2.0.2, we get following warning:

WARNING: Unsupported axiom: 
Ignoring transitivity and/or complex subproperty axioms for broaderTransitive
31-mrt-2010 14:09:54 org.mindswap.pellet.RBox ignoreTransitivity

SPIN

Formulating the constraint in SPIN is straightforward.

ASK WHERE {
?this skos:related ?object1 .
?this skos:broaderTransitive ?object2 .
FILTER (?object1 = ?object2) .
}

Throwing errors as indicated in TopBraid Composer with following example:

error thrown in Topbraid Composer

Pellet ICV 0.4

Using Pellet ICV 0.4 with following constraint::
<rdf:Description rdf:about="http://www.w3.org/2004/02/skos/core#related">
<owl:propertyDisjointWith rdf:resource="http://www.w3.org/2004/02/skos/core#broaderTransitive"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.w3.org/2004/02/skos/core#broaderTransitive">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#TransitiveProperty"/>
</rdf:Description>

reports correctly the constraint violation:

c:\Program Files\pellet-2.1.0>pellet-ic --constraints C:\Users\Paul\TBCMEWorkspa
ce\test\s27c.rdf C:\Users\Paul\TBCMEWorkspace\Test\s27.rdf
15-apr-2010 12:43:41 org.mindswap.pellet.jena.graph.loader.DefaultGraphLoader ad
dUnsupportedFeature
WARNING: Unsupported axiom: Ignoring transitivity axiom due to an existing disjo
intness axioms for property broaderTransitive
15-apr-2010 12:43:41 org.mindswap.pellet.RBox ignoreTransitivity
WARNING: Unsupported axiom: Ignoring transitivity and/or complex subproperty axi
oms for broaderTransitive
Validating 2 integrity constraints
Will stop after 1 constraint violation(s) are found

Validating constraint: related disjointPropertyWith broaderTransitive
Constraint violated : Yes
Violating individuals (1): Concept_1,

Number of constraint(s) violated: 1

FYI: the SPARQL Query generated by Pellet ICV from the OWL Axiom above is:

SELECT  ?x0
WHERE
{ ?x1 skos:related ?x0 ;
skos:broaderTransitive ?x0 .
}

Conclusion

Once again fairly easy to do with SPIN;
a long study of the particularities of OWL2 DL restrictions to find out that this constraint cannot be expressed in OWL2 DL,
but OWL IC using the closed world assumption does the job also with 1 line of code.

0 comments

HTTP in XProc (updated again)

UPDATE: Norm Walsh, the editor of the XProc spec and the developer of Calabash commented that PUT and DELETE are supported by XProc. See at the end of the article.
Apple Leopard server's Wiki and Blog software throws away links to anchors in the same page, so you need to scroll down by yourself.

Normally when I need to figure out how some RESTlike webservices are working, I take a lazy approach and just fire up some Firefox extension such as RestClient or Poster.

A request in Poster

Poster Request
A response in Poster

Poster Response
And a view on RestClient

RestClient

Based on a recent blog post of Norman Walsh on Wiki editing using XProc, I decided to do some experimenting with the http facilities of XProc.

POSTING raw XML to a service checking XML character encoding


I have on following address http://www.proxml.be:8080/check/encoding/ a service running,
which allows any XML binary representation to be POSTed to it and performing a best effort to determine its encoding.
The client to the service must issue an HTTP POST request with a body containing the XML resource to be tested.
The posted XML needs to have an XML declaration with encoding specified.
A report on the determined encoding will be returned as a UTF-8 encoded XML response.
The documentation of this service can be found at http://www.proxml.be:1060/book/view/book:urn:be:proxml:apps:xml:encoding:validator/

This service is running on the latest version of Netkernel, is open-sourced and can be downloaded from http://resources.1060research.com/packages/2010/1/proxml-check-encoding-1.1.0.nkp.jar.

A simple XProc doing this posting with the XML inline:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:http-request omit-xml-declaration="false" encoding="UTF-8">
<p:input port="source">
<p:inline>
<c:request href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body content-type="application/xml" >
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>
</p:inline>
</p:input>
</p:http-request>
</p:declare-step>

I think the code is rather straightforward.
In XProc an HTTP request is represented by a c:request element.

c.request
And the body needs to look like:

In our case:
<c:request href="http://www.proxml.be:8080/check/encoding/" method="POST">
<c:body content-type="application/xml" >
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>
Remark: the request only works for this webservice if the omit-xml-declaration attribute has been explicitly set to 'false', since the posted XML needs to contain the xml declaration to be checked.

An alternative where the xml has been put in a separate file.

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:identity name="id">
<p:input port="source">
<p:document href="input.xml"/>
</p:input>
</p:identity>
<p:insert name="ins" match="/c:request/c:body" position="first-child">
<p:input port="source">
<p:inline>
<c:request href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body content-type="application/xml"/>
</c:request>
</p:inline>
</p:input>
<p:input port="insertion">
<p:pipe port="result" step="id"/>
</p:input>
</p:insert>
<p:http-request name="request" omit-xml-declaration="false" encoding="UTF-8"/>
</p:declare-step>

First an external xml file is read and then the insert step is used to insert this xml as first-child into the c:body element, see the XSLTMatch pattern on the match attribute.

The result of this insert step is:

<c:request
xmlns:c="http://www.w3.org/ns/xproc-step"
href="http://www.proxml.be:8080/check/encoding/"
method="POST">
<c:body
xmlns:c="http://www.w3.org/ns/xproc-step"
content-type="application/xml">
<a>Ăąƥʥϖℬ≝</a>
</c:body>
</c:request>

POSTING XML as a x-www-form-urlencoded parameter


Important: The url used in this example is a fake one.

We need to construct a request that looks like:
<c:request method="POST" href="http://www.example.com/form-action" 
xmlns:c="http://www.w3.org/ns/xproc-step">
<c:body content-type="application/x-www-form-urlencoded">
name=W3C&amp;spec=XProc
</c:body>
</c:request>
where the body contains name=value pairs separated by an &amp;

The p:www-form-urlencode step encodes a set of parameter values as a x-www-form-urlencoded string and injects it into the source document at the XSLTMatch pattern of the match attribute; in our case at /c:request/c:body/text(). If a string '@@HERE@@' is not placed inside the c:body the code will not work.

Since our xml needs to be placed in an attribute value on the c:param element, it needs to be escaped; hence our preliminary escape-markup step. The p:escape-markup step applies XML serialization to the children of the document element and replaces those children with their serialization. This is the reason we use a wrapper element (wrap) around the xml top be escaped.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="main"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:html="http://www.w3.org/1999/xhtml" version="1.0">
<p:output port="result"/>
<p:escape-markup name="escape">
<p:input port="source">
<p:inline>
<wrap><a>Hello: Ăąƥʥϖℬ≝</a>
</wrap>
</p:inline>
</p:input>
</p:escape-markup>
<p:www-form-urlencode match="/c:request/c:body/text()">
<p:input port="source">
<p:inline>
<c:request method="POST"
href="http://www.testservice.org/">
<c:body content-type="application/x-www-form-urlencoded">@@HERE@@</c:body>
</c:request>
</p:inline>
</p:input>
<p:with-param name="uid" select="'test'"/>
<p:with-param name="pwd" select="'test'"/>
<p:with-param name="xml" select=".">
<p:pipe port="result" step="escape"/>
</p:with-param>
</p:www-form-urlencode>
<p:http-request/>
</p:declare-step>

The resulting request:
<?xml version="1.0" encoding="UTF-8"?>
<c:request
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:html="http://www.w3.org/1999/xhtml"
method="POST"
href="http://www.testservice.org">
    <c:body
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:html="http://www.w3.org/1999/xhtml"
content-type="application/x-www-form-urlencoded">uid=test&amp;pwd=test&amp;xml=%3Ca%3EHello:%20%C3%84%E2%80%9A%C3%84%E2%80%A6%C3%86%C2%A5%C3%8A%C2%A5%C3%8F%E2%80%93%C3%A2%E2%80%9E%C2%AC%C3%A2%E2%80%B0%EF%BF%BD%3C/a%3E%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20</c:body>
</c:request>

Alternative 2

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" name="main"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:html="http://www.w3.org/1999/xhtml" version="1.0">
<p:output port="result"/>
<p:escape-markup name="escape">
<p:input port="source">
<p:inline>
<wrap><a>Hello: Ăąƥʥϖℬ≝</a>
</wrap>
</p:inline>
</p:input>
</p:escape-markup>
<p:add-attribute match="c:param" name="xml">
<p:input port="source">
<p:inline>
<c:param name="xml"/>
</p:inline>
</p:input>
<p:with-option name="attribute-name" select="'value'"/>
<p:with-option name="attribute-value" select="wrap/text()">
<p:pipe port="result" step="escape"/>
</p:with-option>
</p:add-attribute>
<p:wrap-sequence name="wrap" wrapper="c:param-set">
<p:input port="source">
<p:inline>
<wrap>
<c:param name="uid" value="test"/>
<c:param name="pwd" value="test"/>
</wrap>
</p:inline>
<p:pipe port="result" step="xml"/>
</p:input>
</p:wrap-sequence>
<p:unwrap match="wrap" name="unwrap"/>
<p:www-form-urlencode match="/c:request/c:body/text()">
<p:input port="source">
<p:inline>
<c:request method="POST"
href="http://www.testservice.org/">
<c:body content-type="application/x-www-form-urlencoded">@@HERE@@</c:body>
</c:request>
</p:inline>
</p:input>
<p:input port="parameters">
<p:pipe port="result" step="unwrap"/>
</p:input>
</p:www-form-urlencode>
<p:http-request/>
</p:declare-step>
All of the pipelines have been tested and with the latest versions of Calumet and Calabash both run from inside OxygenXML.

XProc in OxygenXML

Conclusion


Is this a good replacement for my usual approach?
One is tempted to say yes, but I'm afraid not completely,
since two HTTP methods which are used heavily in a REST environments, being PUT and DELETE are not supported by XProc (yet?).

UPDATE

Norm Walsh, the editor of the XProc spec and the developer of Calabash commented that PUT and DELETE are supported by XProc.

This is good news. I have been put on the wrong leg by following sentence from the spec.

The method attribute specifies the method to be used against the IRI specified by the href attribute, e.g. GET or POST (the value is not case-sensitive).

Using following XProc:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:output port="result"/>
<p:http-request omit-xml-declaration="false" encoding="UTF-8">
<p:input port="source">
<p:inline>
<c:request href="http://localhost:8888/exist/rest/db/fruits/Almonds.xml" method="PUT">
<c:body content-type="application/xml" >
<product>
<category>fruits</category>
<item>Almonds</item>
<inventory>
<sku>AlmofruiIV75Lm</sku>
<price>2</price>
<inventory>915</inventory>
</inventory>
<vendor>TriCounty Produce</vendor>
</product>
</c:body>
</c:request>
</p:inline>
</p:input>
</p:http-request>
</p:declare-step>
I get following response from Calumet from EMC.
SystemID: C:\Users\Paul\OxygenWorkspace\XProc\simplePUT.xpl
Engine name: Calumet XProc
Severity: error
Description: XPROC_ERROR: Unsupported request method: PUT
Original message: XPROC_ERROR: Unsupported request method: PUT

UPDATE: Good news coming our way: HTTP PUT will be supported in the upcoming 1.0.11 release.

And following response from Calabash version 0.9.20 from Norm Walsh
<c:body content-type="application/octet-stream" encoding="base64">
</c:body>
And the PUT has been carried out in eXist-db.
<product xmlns:c="http://www.w3.org/ns/xproc-step">
<category>fruits</category>
<item>Almonds</item>
<inventory>
<sku>AlmofruiIV75Lm</sku>
<price>2</price>
<inventory>915</inventory>
</inventory>
<vendor>TriCounty Produce</vendor>
</product>
Does anyone have a hint how to get rid of the namespace declaration 'http://www.w3.org/ns/xproc-step'

in the PUTTED file?

Yes, credits go to Vojtech Toman of EMC.

Changing

<p:inline>

to

<p:inline exclude-inline-prefixes="c">

does the trick.

1 comment

Integrity constraints in SKOS (part 2)

UPDATE: tested with the new Pellet 2.1.0 and Pellet ICV 0.4

We focus on the SKOS integrity constraint with number S13.

S13 skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint properties.

Meaning that according to the SKOS spec the following data are not allowed:
<skos:Concept rdf:ID="Concept_1">
<skos:altLabel xml:lang="en">test</skos:altLabel>
<skos:prefLabel xml:lang="en">test</skos:prefLabel>
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>Concept_1</rdfs:label>
</skos:Concept>

We investigate if and how this constraint can be implemented using following technologies:

  • OWL 1
  • SPIN
  • OWL 2 DL
    • OWL 2 DL reasoners
  • Pellet ICV

OWL 1


OWL 1 didn't provide the means to assert that if the same pair of individuals is related by more than one property among a given set of disjoint properties, the ontology is inconsistent.

SPIN


Trying to enforce the constraint with SPIN.
The ASK query to define the constraint is:

ASK
WHERE {
OPTIONAL {?subject skos:prefLabel ?pref.}
OPTIONAL {?subject skos:altLabel ?alt.}
OPTIONAL {?subject skos:hiddenLabel ?hidden.}
FILTER (?pref = ?alt || ?alt = ?hidden || ?pref = ?hidden)}
and translated to a spin:constraint on e.g. the Class skos:Concept

ASK WHERE {
OPTIONAL {
?this skos:prefLabel ?pref .
} .
OPTIONAL {
?this skos:altLabel ?alt .
} .
OPTIONAL {
?this skos:hiddenLabel ?hidden .
} .
FILTER (((?pref = ?alt) || (?alt = ?hidden)) || (?pref = ?hidden)) .
}
The result of the SPIN constraint checking in Topbraid Composer:

S13 in TBC


OWL 2


OWL 2 provides the new construct DisjointProperties to state that properties are mutually exclusive.
A disjoint properties axiom takes a set of ObjectProperties or a set of DataProperties and states that those are pair-wise disjoint.

DisjointObjectProperties := 'DisjointObjectProperties' '(' axiomAnnotations ObjectPropertyExpression ObjectPropertyExpression { ObjectPropertyExpression } ')'
DisjointDataProperties := 'DisjointDataProperties' '(' axiomAnnotations DataPropertyExpression DataPropertyExpression { DataPropertyExpression } ')'
I do not find anything in the OWL 2 spec about being able to say that annotation properties are disjoint.
And our SKOS properties skos:prefLabel, skos:altLabel and skos:hiddenLabel are defined as annotation properties.

Can I define skos:prefLabel et all. to be of type and annotation property and data property?

Not in OWL 2 DL, since the sets of IRIs used as object, data, and annotation properties in a DL ontology must be ensured to be disjoint.

Anyhow I wondered what OWL 2 DL reasoners would do if I used following axioms:
<owl:AnnotationProperty rdf:about="http://www.w3.org/2004/02/skos/core#altLabel">
<owl:propertyDisjointWith rdf:resource="http://www.w3.org/2004/02/skos/core#prefLabel"/>
<rdfs:comment xml:lang="en">skos:prefLabel, skos:altLabel and
skos:hiddenLabel are pairwise disjoint properties.</rdfs:comment>
<rdfs:label xml:lang="en">alternative label</rdfs:label>
<skos:definition xml:lang="en">An alternative lexical label for a resource.</skos:definition>
<rdfs:isDefinedBy rdf:resource="http://www.w3.org/2004/02/skos/core"/>
<rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>
<skos:example xml:lang="en">Acronyms, abbreviations, spelling
variants, and irregular plural/singular forms may be included among the
alternative labels for a concept. Mis-spelled terms are normally
included as hidden labels (see skos:hiddenLabel).</skos:example>
</owl:AnnotationProperty>
together with these data
<skos:Concept rdf:about="http://ec.europa.eu/esco/S13#Concept_1">
<skos:prefLabel xml:lang="en">aaa</skos:prefLabel>
<skos:altLabel xml:lang="en">aaa</skos:altLabel>
<skos:hiddenLabel xml:lang="en">aaaaa</skos:hiddenLabel>
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>Concept_1</rdfs:label>
</skos:Concept>

OWL2 DL reasoners


Pellet 2

In Pellet 2.0.2, I get:

Consistent: No
Reason: null

Changing the skos:altlabel to 'aaaa' gives:

Consistent: Yes

UPDATE: The same results with version Pellet 2.1.0.

HermiT

Doing the same with HermiT whatever content used, I get:

It all went pear-shaped: c

Conclusion

I'm not sure what to think about this all. My expectation was that the OWL2 DL reasoners would have come up with a warning that one cannot use a DisjointProperty axiom on Annotation Properties, but they don't. I must be missing something.

Pellet ICV


Pellet ICV is an extension to Pellet that interprets OWL ontologies with the Closed World Assumption in order to detect constraint violations in RDF data, comparable to what is done with XSD schemas for XML data.

I took the propertyDisjointness statements apart in a separate constraints file.

skos:altLabel a owl:AnnotationProperty .
skos:hiddenLabel a owl:AnnotationProperty .
skos:prefLabel a owl:AnnotationProperty .

[] a owl:AllDisjointProperties ;
owl:members (skos:prefLabel skos:altLabel skos:hiddenLabel) .
Running then following data through pellet icv 0.4 using the constraints file above:

[] a owl:Ontology ;
owl:imports <http://www.w3.org/TR/skos-reference/skos.rdf> .
<Test_1> a test:Thing.
<Test_1> skos:prefLabel "test"@en; skos:altLabel "test"@en; skos:hiddenLabel "test"@en .
gives indeed a constraint violation:

Validating 3 integrity constraints
Will stop after 1 constraint violation(s) are found
Validating constraint: disjointProperties prefLabel altLabel hiddenLabel

Constraint violated : Yes
Violating individuals (1): Test_1,
Number of constraint(s) violated: 1

However, if I leave out from my data the statements

[] a owl:Ontology ;
owl:imports <http://www.w3.org/TR/skos-reference/skos.rdf> .

and /or

<Test_1> a test:Thing.

I get:

Validating constraint: disjointProperties prefLabel altLabel hiddenLabel
Constraint violated : No
Validating constraint: disjointProperties prefLabel altLabel hiddenLabel
Constraint violated : No
Validating constraint: disjointProperties prefLabel altLabel hiddenLabel
Constraint violated : No

Number of constraint(s) violated: 0

Which leaves me completely puzzled.

Conclusion

And the winner for constraint S13 is SPIN.
OWLIC is able to do the same but still has some rough edges; surely to be revisited after feedback from C&P.

1 comment

Integrity constraints in SKOS (part 1)

UPDATED: due to the release of Pellet ICV 0.4

If you want to use SKOS, you need to be aware of some integrity constraints formulated in the spec.

Constraint S14

I'll focus in this post on constraint S14:

A resource has no more than one value of skos:prefLabel per language tag.

Meaning that following snippet is not valid according to this constraint, because two different preferred lexical labels have been given with the same language tag.
  :Concept1 skos:prefLabel "love"@en ; 
skos:prefLabel "adoration"@en .

Validating the constraint

The question is how you can validate this constraint?

With OWL1? With OWL2? I don't see it. This doesn't come as a surprise since the different flavors of the SW modeling languages are focusing on inferences, not on validation.

So, let's try SPIN.
SPIN
is a collection of RDF vocabularies specifically made, by using of SPARQL, to define constraints and inference rules on Semantic Web data.
SPIN is available within Topbraid Composer, a SW IDE, which is used to take the upcoming screenshots from. The SPIN API however is also available as open source JAVA API.

Within SPIN one uses a SPARQL ASK query to formulate a constraint.
A SPARQL ASK query evaluates to a boolean. When in a SPIN context the result is false, no violation is assumed; when true there is.
This constraint is applied then to a class and its subclasses using the spin:constraint property.

An introduction to SPIN can be found at Holger Knublauch's blog.

The SPARQL Query

The query we came up with is:

ASK
{
{SELECT ?lang (count(?lang) as ?nr )
WHERE
{?subject skos:prefLabel ?label .
LET (?lang := lang(?label))}
GROUP BY ?lang}
FILTER (?nr > 1)}

This query is in fact using facilities that are not yet in SPARQL 1.0, but are on the drawing board for SPARQL 1.1, being project expressions and subqueries. Luckily these facilities have been already implemented in Jena's ARQ.

Let's have a closer look and start with the inner query

SELECT ?lang (count(?lang) as ?nr ) 
WHERE
{?subject skos:prefLabel ?label .
LET (?lang := lang(?label))}
GROUP BY ?lang

We start from triples using skos:prefLabel.
From the object we take the language with the lang() function and we use these language values to base a grouping upon.
Then we return and the language and the nr of times the language (using the project expression count(?lang) as ?nr) has been used (since grouped).

Using e.g. this example input:

input of queryWe get following output.

result of select query

Of course we are only interested in those languages that appear more than once. Hence the FILTER at the end and this all wrapped within an ASK to get a boolean result.

Use the Query for the SPIN Constraint


We associate now this ASK query with the Class Concept (and inherently with its subclasses). The only change made is that we are using the dedicated variable ?this to access the current instance.


Validating the constraint

Using this specific concept as input, no error is thrown.

No error

Adding a second prefLabel in french raises an error; hence our constraint is violated. Point proven.

error

If you see/have a better ASK query for addressing the same problem, please add it as comment. For all to learn.

The C&P approach for checking integrity constraints

Clark & Parsia take a comparable approach where OWL axioms are transformed into SPARQL queries to do closed world validation.
However this particular constraint cannot be expressed with OWL axioms as explained by Evren Sirin.

"There is one particularly pesky SKOS constraint (S14) that cannot be expressed as an OWL IC:

S14: A resource has no more than one value of skos:prefLabel per language tag"

Conclusion

If you speak SPARQL fluently, it is fairly easy to define constraints on your RDF data using SPIN.
In following posts I'll try to figure out if I can implement other SKOS constraints with OWL2, and if not using SPIN and/or OWL IC.


0 comments

slides XRX

My slides used during the well attended seminar on XRX (45 attendees) for SAI.

Download file "XRX.pdf"

2 comments

Strange reasoning with Open Calais

I have been testing Open Calais.
Open Calais is a Web Service for text mining that can extract entities (persons, companies, countries, ... ) in RDF/OWL from arbitrary text and HTML documents.

My favorite SW IDE TopBraid Composer supports Open Calais as one of its import features. So I gave it a try with following HTML page: Wikipedia's entry on John Zorn.

John Zorn on Wikipedia
This is the 'quite impressive' result of Open Calais' mining, detecting amongst others 38 instances of 'Music Albums' and 25 of 'Music Groups'.

Detected classes and instances
Is this perfect? Of course not. One of his most famous bands Masada is not detected as a 'Music Band', but as a 'Facility' and 'Product'.

The semantics of all those classes can be found at the url of the namespace(s) used.
The 'cale' prefix stand for following uri 'http://s.opencalais.com/1/type/em/e/'.
This uri is dereferenceable and offers, depending on the content-type negotiated by the client, a human oriented HTML representation or a machine readable RDF version.

Below the human targeted explanation of class 'Facility'.

Facility explained
Looking at the RDF descriptions, we discover a lot of rdfs:domain and rdfs:range statements.
So I decided to make use of these statements to infer, using a reasoner, new triples getting surprising results.

An example result.

The resource with following identifier
idbeing initially of type 'Company' now becomes an instance of the list below:

instanceOf
which sounds as complete nonsense to me.

Thanks to using Pellet 1.5.2 as reasoner, we are able to ask where those inferred triples come from (for me the feature why you cannot live without Pellet):

Pellet explanation
And indeed by assigning the property 'c:name' to a resource, this resource becomes automatically (by the rdfs:domain semantics) an instance of all the classes being the object of all those "c:name rdfs:domain ?object" statements.

IF
P rdfs:domain D
AND
x P y
THEN
x rdf:type D.
I have the impression that a classical SW modeling error has been made over here, misusing' rdfs;domain' to assign a property to a class as you normally do in object oriented modeling.
In the SW a property however can be used anywhere and is independent of any class and the property rdfs:domain is used solely for inferencing. In the Open Calais case, I cannot imagine that these are the inferences you want.

Something to report at the Pedantic Web Group?



2 comments

Some SPARQL extension function tricks

Identifiers with qualifiers in round brackets

DBPedia is using in its identifiers qualifiers within round brackets to be able to distinguish between entities having the same names.
An example of such identifier is : http://dbpedia.org/resource/Belle_de_jour_(film).

Belle de jour in TabulatorUsing ARQ (Jena) functions to grab the namespace and the localname of this identiifer doesn't seem to work as hoped for.
SELECT ?ln ?ns
WHERE {
LET (?ln := afn:localname(<http://dbpedia.org/resource/Belle_de_jour_(film)>)) .
LET (?ns := afn:namespace(<http://dbpedia.org/resource/Belle_de_jour_(film)>)) .
}

results in

Result of Jena functions
This is how we solved this issue.

SELECT ?ln ?ns ?fn ?li ?name ?namespace
WHERE {
LET (?ns := afn:namespace(<http://dbpedia.org/resource/Belle_de_jour_(film)>)) .
LET (?ln := afn:localname(<http://dbpedia.org/resource/Belle_de_jour_(film)>)) .
LET (?fn := fn:concat(?ns, ?ln)) .
LET (?li := (smf:lastIndexOf(?fn, "/") + 1)) .
LET (?name := afn:substr(?fn, ?li)) .
LET (?namespace := afn:substr(?fn, 0, ?li)).
}
We work with the complete string to find the position of the last '/'.
Having this position we take the substring before and after.

The functions starting with prefix 'fn' are the by ARQ supported XQuery/XPath functions and operators.
The functions starting with 'afn' are ARQ extension functions.
The function 'lastIndexOf' however is an extension only available in SparqlMotion of TopQuadrant.

Constructing dataset descriptions using voiD

voiD is a vocabulary to describe datasets, being a collection of data, published and maintained by a single provider as RDF and accessible by dereferenceable URI's and/or SPARQL endpoints and/or as data dump, ...

Next to all types of metadata related to the dataset such as dcterms:title, dcterms:creator ... a voiD description also offers the possibility to include some statistical data about the dataset.
Examples are:
  • the number of triples
  • the number of resources
  • the number of DistinctSubjects
  • the number of DistinctObjects
These are some example SPARQL queries which can be used for this purpose.

CONSTRUCT  {:X a void:Dataset.
:X void:statItem _:b0.
_:b0 scovo:dimension void:numberOfTriples.
_:b0 rdf:value ?triples.
:X void:statItem _:b1.
_:b1 scovo:dimension void:numberOfResources.
_:b1 scovo:dimension skos:Concept.
_:b1 rdf:value ?concepts.
:X void:statItem _:b2.
_:b2 scovo:dimension void:numberOfResources.
_:b2 scovo:dimension dcterms:Agent.
_:b2 rdf:value ?agents.
:X void:statItem _:b3.
_:b3 scovo:dimension void:numberOfDistinctSubjects.
_:b3 rdf:value ?nrsubj.
:X void:statItem _:b4.
_:b4 scovo:dimension void:numberOfDistinctObjects.
_:b4 rdf:value ?nrobj.
}
WHERE {
LET (?triples := smf:countMatches(?s, ?p, ?o)) .
LET (?concepts := smf:countMatches(?concept, rdf:type, skos:Concept)) .
LET (?agents := smf:countMatches(?agent, rdf:type, dcterms:Agent)) .
LET (?nrsubj := smf:countResults("SELECT DISTINCT ?s WHERE {?s ?p ?o.}")) .
LET (?nrobj := smf:countResults("SELECT DISTINCT ?o WHERE {?s ?p ?o.}")) .
}
Once again using heavily TopQuadrant's SparqlMotion specific SPARQL extension functions.
For an overview of SPARQL extension functions available in different products, see the SPARQL Extension Function Survey of Leigh Dodds.

Hope

I hope that the upcoming SPARQL 1.1 spec comes up with a long list of standardized functions.
For now we still need to write too many solutions using tool specific SPARQL extensions.

0 comments

Seminar The Digital Future of Cultural Heritage

The slides of my SW talk at the very successful seminar "Access to Enriched Cultural Heritage Information Using Semantic Web Technology
Leuven, Belgium, 21 January 2010" are available here.

Download file "ErfgoedPlus.pdf"

0 comments

Semantic Web entailment regimes

W3C has published a set of URI-s to uniquely identify Semantic Web entailment regimes.

I added some links to the related specs on the web and explanations in the book Foundations of Semantic Web Technologies, by Pascal Hitzler, Markus Krötzsch, Sebastian Rudolph, Chapman & Hall; 1 edition (13 Aug 2009), ISBN-10: 142009050X, ISBN-13: 978-1420090505.


Foundations of Semantic Web Technologies



The various entailment regimes are:

Entailment regime Namespace Links Book: Foundation of Semantic Web Technologies
Simple Entailment http://www.w3.org/ns/entailment/Simple RDF Semantics - Simple Entailment
RDF Semantics - Simple Entailment Rules
page 75 + 92.
RDF Entailment http://www.w3.org/ns/entailment/RDF RDF Semantics - RDF Entailment
RDF Semantics - RDF Entailment Rules SPARQL 1.1 Entailment regimes - RDF Entailment
page 78 + 92.
RDFS Entailment http://www.w3.org/ns/entailment/RDFS RDF Semantics - RDFS Entailment
RDF Semantics - RDFS Entailment Rules SPARQL 1.1 Entailment regimes - RDFS Entailment
page 81 + 96.
D Entailment http://www.w3.org/ns/entailment/D RDF Semantics - D Entailment
RDF Semantics - D Entailment Rules SPARQL 1.1 Entailment regimes - D Entailment
page 85 + 102.
OWL Entailment with Direct Semantics http://www.w3.org/ns/entailment/OWL-Direct OWL 2 - Direct Semantics
OWL Entailment with RDF Based Semantics http://www.w3.org/ns/entailment/OWL-RDF-Based OWL 2 - RDF-based Semantics
OWL 2 RL
RIF Entailment http://www.w3.org/ns/entailment/RIF RIF Entailment
SPARQL 1.1 Entailment regimes - RIF Entailment

0 comments

Article on Semantic Web

My article on the Semantic Web (in Dutch) that appeared in DB/M Database Magazine, Issue 8, December 2009 is
available online.
Note: the link is finally working now (since 2009/12/14).

0 comments

Slides Semantic Technologies talk

The slides used during my talk on Semantic Web Technologies for SAI in Brussels on November 24th 2009 are available here.

Download file "SW.pdf"

0 comments

Berkeley DB XML and OxygenXML on Mac OSX (part 3)

Prerequisites described in Berkeley DB XML and OxygenXML on Mac OSX (part 1).

Important

OxygenXML always connects to a Berkeley DBXML environment in transactional mode.
If one is using the dbxml shell application, it defaults to non-transactional mode.

So to get the 2 applications work in a compatible way we need to make sure we use the shell application in transactional mode.

Procedure with the Berkeley DBXML Shell

  1. We create a folder to create an environment in.
    E.g. /dbxml2
  2. we create the environment by running from the shell
    dbxml -c -h /dbxml2 -t
    argument explanation
    -h to define the directory to be used as database environment
    -c to create a new environment in the folder defined with -h
    -t to specify the transactional mode

    result in Finder
  3. we create a container

    create container

    result in Finder
  4. to do some CRUD operations now, work in transactional mode

    transactional mode
  5. E.g. Put an XML doc in the container

    PutDocument
    'Apples.xml' becoming the name of the document within Berkeley DBXML and 'f' as indicator that we load a file as opposed to 's' for indicating 'load as string'.
  6. Commit the transaction

    commit

Connecting to this environment from OxygenXML

The only thing to do is adding a new connection.

Connection
This time we indicated to 'join an existing environment' taking over all settings set externally.

Now we have access to the externally created environment and containers from within OxygenIDE.

view on externally created environment


0 comments

Berkeley DB XML and OxygenXML on Mac OSX (part 2)

Necessary preparations explained in Berkeley DB XML and OxygenXML on Mac OSX (part 1).

My initial settings in OxygenXML


The creation of a Berkeley DBXML Data Source in the OxygenXML Preferences:

Data Source Settings
The concrete connection to the DBXML datasource with '/dbxml' as the environment.

Connection
Leading us to following 'Database Explorer' view in the editor.

Database view

Starting to work from inside OxygenXML

XML documents are stored within Berkeley DBXML in files called containers. A container is the equivalent of a database holding the xml files as well as the metadata and indexes.

environment, containers, docsSo let's start to create a container first.

Berkeley DBXML supports two types of containers: 'Wholedoc' or 'Node':
  • Wholedoc containers store XML documents exactly as they are, retaining all document white space.
  • Node containers decompose the documents into individual nodes.
Node documents are faster to query but wholedoc containers are capable of retrieving entire documents more quickly. I use in most cases the 'node' variant.

container creation
The 'Index nodes' causes indices for the container to return nodes rather than documents. This is most of the time the case for me.

container creationNow you can add to this container the xml files you want to index and query.

file added
Next time I cover 'Starting from Berkeley DBXML using the DBXML Shell application'.

0 comments