Tags

Integrity constraints in SKOS (part 1)

UPDATED: due to the release of Pellet ICV 0.4

If you want to use SKOS, you need to be aware of some integrity constraints formulated in the spec.

Constraint S14

I'll focus in this post on constraint S14:

A resource has no more than one value of skos:prefLabel per language tag.

Meaning that following snippet is not valid according to this constraint, because two different preferred lexical labels have been given with the same language tag.
  :Concept1 skos:prefLabel "love"@en ; 
skos:prefLabel "adoration"@en .

Validating the constraint

The question is how you can validate this constraint?

With OWL1? With OWL2? I don't see it. This doesn't come as a surprise since the different flavors of the SW modeling languages are focusing on inferences, not on validation.

So, let's try SPIN.
SPIN
is a collection of RDF vocabularies specifically made, by using of SPARQL, to define constraints and inference rules on Semantic Web data.
SPIN is available within Topbraid Composer, a SW IDE, which is used to take the upcoming screenshots from. The SPIN API however is also available as open source JAVA API.

Within SPIN one uses a SPARQL ASK query to formulate a constraint.
A SPARQL ASK query evaluates to a boolean. When in a SPIN context the result is false, no violation is assumed; when true there is.
This constraint is applied then to a class and its subclasses using the spin:constraint property.

An introduction to SPIN can be found at Holger Knublauch's blog.

The SPARQL Query

The query we came up with is:

ASK
{
{SELECT ?lang (count(?lang) as ?nr )
WHERE
{?subject skos:prefLabel ?label .
LET (?lang := lang(?label))}
GROUP BY ?lang}
FILTER (?nr > 1)}

This query is in fact using facilities that are not yet in SPARQL 1.0, but are on the drawing board for SPARQL 1.1, being project expressions and subqueries. Luckily these facilities have been already implemented in Jena's ARQ.

Let's have a closer look and start with the inner query

SELECT ?lang (count(?lang) as ?nr ) 
WHERE
{?subject skos:prefLabel ?label .
LET (?lang := lang(?label))}
GROUP BY ?lang

We start from triples using skos:prefLabel.
From the object we take the language with the lang() function and we use these language values to base a grouping upon.
Then we return and the language and the nr of times the language (using the project expression count(?lang) as ?nr) has been used (since grouped).

Using e.g. this example input:

input of queryWe get following output.

result of select query

Of course we are only interested in those languages that appear more than once. Hence the FILTER at the end and this all wrapped within an ASK to get a boolean result.

Use the Query for the SPIN Constraint


We associate now this ASK query with the Class Concept (and inherently with its subclasses). The only change made is that we are using the dedicated variable ?this to access the current instance.


Validating the constraint

Using this specific concept as input, no error is thrown.

No error

Adding a second prefLabel in french raises an error; hence our constraint is violated. Point proven.

error

If you see/have a better ASK query for addressing the same problem, please add it as comment. For all to learn.

The C&P approach for checking integrity constraints

Clark & Parsia take a comparable approach where OWL axioms are transformed into SPARQL queries to do closed world validation.
However this particular constraint cannot be expressed with OWL axioms as explained by Evren Sirin.

"There is one particularly pesky SKOS constraint (S14) that cannot be expressed as an OWL IC:

S14: A resource has no more than one value of skos:prefLabel per language tag"

Conclusion

If you speak SPARQL fluently, it is fairly easy to define constraints on your RDF data using SPIN.
In following posts I'll try to figure out if I can implement other SKOS constraints with OWL2, and if not using SPIN and/or OWL IC.


Comments