Moving to another triple store supporting SPARQL 1.1

Update: See similar post of Ric Roberts: Installing Jena, Joseki and TDB on OS X or Linux

Problem statement


We have been using in a project Sesame as rdf triple store and SPARQL endpoint.
The reasons we choose Sesame were:
  • it is open source and free
  • it is very easy to set up
  • it has a very friendly user interface
  • there is a built-in connector in Topbraid Composer ME; the chosen SW IDE
  • and it is a national ('dutch') product.
We want now to expose the dataset utilizing faceted navigation aids as offered by Paggr Prospect, the faceted browser builder for Linked Data.

Prospect in action on Crunchbase
For being able to do so Prospect however must be connected to a SPARQL endpoint that offers aggregate functions as being defined in SPARQL 1.1.

An example using the aggregate function COUNT:

SELECT COUNT(?person) AS ?alices
WHERE {
?person :name "Alice" .
}
SPARQL 1.1 however is not (yet?) supported in Sesame.

Joseki seemed a viable replacement:
  • it is open source and free
  • it is build upon the very robust Jena/ARQ RDF framework
  • there is a built-in connector in Topbraid Composer ME, which uses the same Jena/ARQ RDF framework
  • and offers support for SPARQL 1.1
As persistency layer we choose TDB.

Migration


The migration went much more smoothly than expected. This is what we did.
  1. Create from the Sesame store a TDB store
  2. install Joseki
    1. set environmental variables JOSEKIROOT and the JAVA CLASSPATH
  3. Configure Joseki
    1. adapt config.ttl to add the TDB store
    2. adapt the web.xml file
    3. make a HTML form for querying the SPARQL endpoint

Create a TDB store


We used for this one of the built-in facilities of Topbraid Composer. You can direcly export your existing dataset to a TDB database.Export to TDB

Install Joseki

  1. Unzip the distribution.
  2. Set the JOSEKIROOT environment variable to the location of the installation.

    JOSEKIROOT
  3. Make sure your Java classpath points to %JOSEKIROOT%/lib

    setting the classpath
  4. Test the installation by running from the command-line.

    starting Joseki
    Important: do not try to run the rdfserver bat file from the bin directory; run it directly from %JOSEKIROOT%.

    Point your web server now to http://localhost:2020/ and you should get this screen:

    The JOSEKI startup screen

Configure Joseki

  1. Add a service and a dataset to the joseki-config.ttl file (a service with name 'TDB' using dataset 'newdata').
    The joseki-config.ttl is found in %JOSEKIROOT%.
     # Service 3 - SPARQL processor only handling a given dataset
    <#service3>
    rdf:type joseki:Service ;
    rdfs:label "SPARQL on TDB" ;
    joseki:serviceRef "TDB" ;
    # web.xml must route this name to Joseki
    # dataset part
    joseki:dataset <#newdata> ;
    # Service part.
    # This processor will not allow either the protocol,
    # nor the query, to specify the dataset.
    joseki:processor joseki:ProcessorSPARQL_FixedDS ;

    and the settings for the 'newdata' dataset.

     ## Initialize TDB.
    [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
    tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

    <#newdata> rdf:type tdb:DatasetTDB ;
    rdfs:label "A new TDB dataset" ;
    tdb:location "C:/Users/Paul/MyWorkSpaces/TBCMEWorkspace/Test/kennedys.tdb.data" .
  2. add 'TDB' to the web.xml file which resides in %JOSEKIROOT\webapps\joseki\WEB-INF

    <servlet-mapping>
    <servlet-name>SPARQL service processor</servlet-name>
    <url-pattern>/TDB</url-pattern>
    </servlet-mapping>
  3. make a HTML file (in my case myQuery.html containing a form to post queries to the TDB SPARQL service and copy it next to the other html files in %JOSEKIROOT%\webapps\joseki.

    <form action="TDB" method="get">
    <p>SELECT - get variables (apply XSLT stylesheet)</p>
    <p><textarea name="query" cols="70" rows="5">
    PREFIX kennedys: &lt;http://topbraid.org/examples/kennedys#>
    SELECT ?a ?c
    WHERE
    { ?a kennedys:name ?c}</textarea>
    <br/>
    Output XML: <input type="radio" name="output" value="xml" checked/>
    with XSLT style sheet (leave blank for none):
    <input name="stylesheet" size="25" value="/xml-to-html.xsl" /> <br/>
    or JSON output: <input type="radio" name="output" value="json"/> <br/>
    or text output: <input type="radio" name="output" value="text"/> <br/>
    or CSV output: <input type="radio" name="output" value="csv"/> <br/>
    or TSV output: <input type="radio" name="output" value="tsv"/> <br/>
    Force the accept header to <tt>text/plain</tt> regardless
    <input type="checkbox" name="force-accept" value="text/plain"/>
    <br/>

    <input type="submit" value="Get Results" />
    </p>
    </form>

Result

Going to http://localhost:2020/myQuery.html gives

Querying the TDB store
The query shown uses SPARQL 1.1 aggregates and returns as result:

SPARQL 1.1 query results
Our SPARQL 1.1 endpoint up and running in 2 hours, migration included.
Next step is building the faceted browser interface. So more is to come.



Comments