SPARQL 1.1 aggregates support UPDATE

Context


We have been looking for a web framework that, by talking to a triple store, offers
faceted and set-based navigation in addition to full text and fielded search similar to the features of Siderean Seamark Navigator which is not being developed anymore.

After a market investigation we came to the conclusion that semsol's Paggr Prospect came closest.

Paggr Prospect though needs to be able to talk to a SPARQL endpoint that implements SPARQL 1.1 aggregation functions.

An example of such a query:
SELECT DISTINCT ?name (count( ?person) as ?total )
WHERE {
?person kennedys:gender ?gender.
?gender rdfs:label ?name.
}
GROUP BY ?name
ORDER BY desc(?total)

which gives following result:

SPARQL count result

semsol's own ARC offers those, but we wanted to evaluate other options since we are also interested in being able to do geo queries.

Testing


Our test file is a 168.871 kB RDF/XML file containing 1.292.253 triples.

We tested following triple stores/sparql endpoints:
Product RDF file loaded support documented SPARQL aggregate working
Sesame/native store yes yes since version Sesame 2.4.0 yes
Sesame/BigOWLIM yes yes since version 4.0 yes
Joseki/ARQ/TDB yes yes yes
Joseki/ARQ/bigOWLIM yes yes yes
Virtuoso yes yes yes
4store yes not found gives wrong result
Allegrograph yes not found gives syntax error
Talis Platform yes yes yes

Sesame does support SPARQL 1.1 aggregates.

Sesame SPARQL 1.1 Query
With following result:

Sesame result query

Joseki/ARQ does and with a TDB and BigOWLIM backend:

Joseki Query

Joseki Result
ARQ with its 1.1 capabilities is also used within TopBraid Live which offers also a SPARQL endpoint.

Virtuoso does also:

Virtuoso
4store is working on support but doesn't return the expected result


4store querywith result

4store result

Allegrograph doesn't have support for aggregates yet, but is working on it and should appear in an upcoming release.





4store
The Talis Platform (SaaS) could not handle the large upload, but I succeeded to populate the store by uploading smaller files.
I used following XProc script for doing this:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
name="myPipeline"
version="1.0">
<p:output
port="result"
sequence="true" />
    <!-- path to the folder with rdf files -->
<p:variable
name="path"
select="'/Users/paul/Desktop/RandD/owlimimport/erfgoedplusImport/'">
<p:empty />
</p:variable>
<p:directory-list
include-filter=".*\.rdf">
<p:with-option
name="path"
select="$path">
<p:empty />
</p:with-option>
</p:directory-list>
<p:for-each
name="directoryloop">
<p:output
port="result"
sequence="true" />
<p:iteration-source
select="/c:directory/c:file" />
<p:variable
name="file"
select="concat($path,/c:file/@name)" />
<p:load
name="file">
<p:with-option
name="href"
select="$file" />
</p:load>
<p:insert
match="/c:request/c:body"
position="first-child">
<p:input
port="source">
<p:inline>
<c:request
href="http://api.talis.com/stores/zzzzzzzzz/meta"
method="POST"
detailed="true" auth-method="digest" username="xxxxxxx" password="yyyyyyy">
<c:body
content-type="application/rdf+xml" />
</c:request>
</p:inline>
</p:input>
<p:input
port="insertion">
<p:pipe
port="result"
step="file" />
</p:input>
</p:insert>
<p:http-request
name="request"/>
<p:identity />
</p:for-each>
</p:declare-step>
The query in the Talis platform:

Talis SPARQL aggregate query
with this result:

Talis query result

Conclusion

Since my previous posting, support for SPARQL 1.1 aggregates became more prominent in the marketplace, which is good to see happening.


Comments

Kingsley Idehen (unauthenticated)
Jan 17, 2011

A few clarifications re. Virtuoso.

Most basic way of loading data into Virtuoso is via a SPARQL query that takes the form:

#pragma for HTTP GET'ing data from a URL. Note, these could be RDF or non RDF resources

define get:soft "replace"
select distinct * from <ResourceURL> where {?s ?p ?o}

In addition re. the above:

1. It has always offered Full Text Search as an integral part of its SPARQL engine
2. It offer GeoSpatial Querying (SPARQL-GEO) since v6.1
3. It offers Faceted Browsing that works at Web Scale supporting faceted browsing over billions of triples

Links:

1. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html -- covers SPARUL, Full Text, GeoSpatial, Transitivity, Entity Ranking and many other features.

2. http://lod.openlinksw.com/demo_queries/ -- collection of demo queries hosted by the lod cloud cache (massive live Virtuoso instance)

3. http://lod.openlinksw.com/ -- faceted browsing at Web Scale, enter a Text Pattern then use Navigation section to filter results by Type or Attribute values, once happy click on the "EntityX" link or "Show matching values" link

4. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtFacetBrowserInstallConfig -- faceted browser engine guide .

Kingsley
2.
1. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html --

Keith Alexander (unauthenticated)
Jan 17, 2011

Hi there,

I'm a consultant for the Talis Platform. It's hard to reliably POST large files (eg: 168, 871 kb) in a single push over HTTP, which is very likely the reason for your 502. The best way to do it is to 'chunk' the data into files of about, say 1mb in size, and POST in each of these files. One way to do this is to convert your RDF/XML to Ntriples (which has one triple per line) and use the UNIX commandline tool 'split' to create the smaller files.

Another option is to use a streaming parser (ARC's RDF/XML parser can stream), and have it POST off batches of, say 1000 triples, as it is parsing. See http://blogs.talis.com/n2/archives/71/ for an example of taking advantage of ARC's streaming parser.

Living in the XML and RDF world
Jan 18, 2011

Kingsley and Keith,

Will take your advice into account and let you now what I find out.

Barry Bishop (unauthenticated)
Jan 18, 2011

This is a very nice, practical comparison that will likely be of use to anyone looking for a SPARQL 1.1 implementation. Providing aggregates (and other 1.1 Query features) was the main motivation for adding support for Jena to BigOWLIM. Interestingly, the performance of Joseki/ARQ/bigOWLIM seems to be far better than Joseki/ARQ/TDB.
See the LUBM section of http://www.ontotext.com/owlim/owlim-jena-performance.html

Scott Henninger (unauthenticated)
Jan 18, 2011

TopBraid Live also supports SPARQL Endpoints. See http://topquadrantblog.blogspot.com/search/label/SPARQL%20endpoint for a description. One feature not found (I believe) in the other endpoints is that multiple graphs can be supported by ther TBL server. Use the SPARQL GRAPH keyword to declare which graph(s) to apply a query to. The SPARQL 1.1 aggregates are all supported, as ARQ is used as the SPARQL query engine.

Keith Alexander (unauthenticated)
May 13, 2011

Hi Paul,

If you want to try again with the Talis Platform and would like any help, feel free to drop me a line at keith dot alexander at talis dot com, or ask in #talis on irc.freenode.net

Keith

Eddy Vanderlinden (unauthenticated)
Dec 18, 2011

Thanks Paul for sharing your results.
This helps indeed, also the comments on the article do.