I have been testing
Open Calais.
Open Calais is a Web Service for
text mining that can extract entities (persons, companies, countries, ... ) in RDF/OWL from arbitrary text and HTML documents.
My favorite SW IDE
TopBraid Composer supports Open Calais as one of its import features. So I gave it a try with following HTML page:
Wikipedia's entry on John Zorn.

This is the 'quite impressive' result of Open Calais' mining, detecting amongst others 38 instances of 'Music Albums' and 25 of 'Music Groups'.

Is this perfect? Of course not. One of his most famous bands
Masada is not detected as a 'Music Band', but as a 'Facility' and 'Product'.
The semantics of all those classes can be found at the url of the namespace(s) used.
The 'cale' prefix stand for following uri '
http://s.opencalais.com/1/type/em/e/'.
This uri is dereferenceable and offers, depending on the content-type negotiated by the client, a human oriented HTML representation or a machine readable RDF version.
Below the human targeted explanation of class 'Facility'.

Looking at the RDF descriptions, we discover a lot of rdfs:domain and rdfs:range statements.
So I decided to make use of these statements to infer, using a reasoner, new triples getting surprising results.
An example result.
The resource with following identifier

being initially of type 'Company' now becomes an instance of the list below:

which sounds as complete nonsense to me.
Thanks to using
Pellet 1.5.2 as reasoner, we are able to ask where those inferred triples come from (for me the feature why you cannot live without Pellet):

And indeed by assigning the property 'c:name' to a resource, this resource becomes automatically (by the rdfs:domain semantics) an instance of all the classes being the object of all those "c:name rdfs:domain ?object" statements.
IF
P rdfs:domain D
AND
x P y
THEN
x rdf:type D.
I have the impression that a classical SW modeling error has been made over here, misusing' rdfs;domain' to assign a property to a class as you normally do in object oriented modeling.
In the SW a property however can be used anywhere and is independent of any class and the property rdfs:domain is used solely for inferencing. In the Open Calais case, I cannot imagine that these are the inferences you want.
Something to report at the
Pedantic Web Group?
Comments
Rafi (unauthenticated)
May 11, 2010
Few months ago we released an OWL ontology fixing the problem described in your blog. The corrected schema can be downloaded from
http://www.opencalais.com/files/owl.opencalais-4.3a.xml. Note, dereferencing OpenCalais classes and predicates URI retrieves the erroneous old schema and therefore make sure that you use the ontology from the enclosed URL.
Hope this helps
Rafi Shachar
Open Calais Team
Living in the XML and RDF world
May 19, 2010
Rafi,
Thanks for the update.
Paul