SELECT DISTINCT ?name (count( ?person) as ?total )
WHERE {
?person kennedys:gender ?gender.
?gender rdfs:label ?name.
}
GROUP BY ?name
ORDER BY desc(?total)
which gives following result:
| Product | RDF file loaded | support documented | SPARQL aggregate working |
|---|---|---|---|
| Sesame/native store | yes | yes since version Sesame 2.4.0 | yes |
| Sesame/BigOWLIM | yes | yes since version 4.0 | yes |
| Joseki/ARQ/TDB | yes | yes | yes |
| Joseki/ARQ/bigOWLIM | yes | yes | yes |
| Virtuoso | yes | yes | yes |
| 4store | yes | not found | gives wrong result |
| Allegrograph | yes | not found | gives syntax error |
| Talis Platform | yes | yes | yes |

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:c="http://www.w3.org/ns/xproc-step"
xmlns:cx="http://xmlcalabash.com/ns/extensions"
name="myPipeline"
version="1.0">
<p:output
port="result"
sequence="true" />
<!-- path to the folder with rdf files -->The query in the Talis platform:
<p:variable
name="path"
select="'/Users/paul/Desktop/RandD/owlimimport/erfgoedplusImport/'">
<p:empty />
</p:variable>
<p:directory-list
include-filter=".*\.rdf">
<p:with-option
name="path"
select="$path">
<p:empty />
</p:with-option>
</p:directory-list>
<p:for-each
name="directoryloop">
<p:output
port="result"
sequence="true" />
<p:iteration-source
select="/c:directory/c:file" />
<p:variable
name="file"
select="concat($path,/c:file/@name)" />
<p:load
name="file">
<p:with-option
name="href"
select="$file" />
</p:load>
<p:insert
match="/c:request/c:body"
position="first-child">
<p:input
port="source">
<p:inline>
<c:request
href="http://api.talis.com/stores/zzzzzzzzz/meta"
method="POST"
detailed="true" auth-method="digest" username="xxxxxxx" password="yyyyyyy">
<c:body
content-type="application/rdf+xml" />
</c:request>
</p:inline>
</p:input>
<p:input
port="insertion">
<p:pipe
port="result"
step="file" />
</p:input>
</p:insert>
<p:http-request
name="request"/>
<p:identity />
</p:for-each>
</p:declare-step>
Comments
Kingsley Idehen (unauthenticated)
Jan 17, 2011
A few clarifications re. Virtuoso.
Most basic way of loading data into Virtuoso is via a SPARQL query that takes the form:
#pragma for HTTP GET'ing data from a URL. Note, these could be RDF or non RDF resources
define get:soft "replace"
select distinct * from <ResourceURL> where {?s ?p ?o}
In addition re. the above:
1. It has always offered Full Text Search as an integral part of its SPARQL engine
2. It offer GeoSpatial Querying (SPARQL-GEO) since v6.1
3. It offers Faceted Browsing that works at Web Scale supporting faceted browsing over billions of triples
Links:
1. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html -- covers SPARUL, Full Text, GeoSpatial, Transitivity, Entity Ranking and many other features.
2. http://lod.openlinksw.com/demo_queries/ -- collection of demo queries hosted by the lod cloud cache (massive live Virtuoso instance)
3. http://lod.openlinksw.com/ -- faceted browsing at Web Scale, enter a Text Pattern then use Navigation section to filter results by Type or Attribute values, once happy click on the "EntityX" link or "Show matching values" link
4. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtFacetBrowserInstallConfig -- faceted browser engine guide .
Kingsley
2.
1. http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html --
Keith Alexander (unauthenticated)
Jan 17, 2011
Hi there,
I'm a consultant for the Talis Platform. It's hard to reliably POST large files (eg: 168, 871 kb) in a single push over HTTP, which is very likely the reason for your 502. The best way to do it is to 'chunk' the data into files of about, say 1mb in size, and POST in each of these files. One way to do this is to convert your RDF/XML to Ntriples (which has one triple per line) and use the UNIX commandline tool 'split' to create the smaller files.
Another option is to use a streaming parser (ARC's RDF/XML parser can stream), and have it POST off batches of, say 1000 triples, as it is parsing. See http://blogs.talis.com/n2/archives/71/ for an example of taking advantage of ARC's streaming parser.
Living in the XML and RDF world
Jan 18, 2011
Kingsley and Keith,
Will take your advice into account and let you now what I find out.
Barry Bishop (unauthenticated)
Jan 18, 2011
This is a very nice, practical comparison that will likely be of use to anyone looking for a SPARQL 1.1 implementation. Providing aggregates (and other 1.1 Query features) was the main motivation for adding support for Jena to BigOWLIM. Interestingly, the performance of Joseki/ARQ/bigOWLIM seems to be far better than Joseki/ARQ/TDB.
See the LUBM section of http://www.ontotext.com/owlim/owlim-jena-performance.html
Scott Henninger (unauthenticated)
Jan 18, 2011
TopBraid Live also supports SPARQL Endpoints. See http://topquadrantblog.blogspot.com/search/label/SPARQL%20endpoint for a description. One feature not found (I believe) in the other endpoints is that multiple graphs can be supported by ther TBL server. Use the SPARQL GRAPH keyword to declare which graph(s) to apply a query to. The SPARQL 1.1 aggregates are all supported, as ARQ is used as the SPARQL query engine.
Keith Alexander (unauthenticated)
May 13, 2011
Hi Paul,
If you want to try again with the Talis Platform and would like any help, feel free to drop me a line at keith dot alexander at talis dot com, or ask in #talis on irc.freenode.net
Keith
Eddy Vanderlinden (unauthenticated)
Dec 18, 2011
Thanks Paul for sharing your results.
This helps indeed, also the comments on the article do.