Tools to create rdf data cubes
Tools to create a DSD (Data Structure Definition)
Unfortunately, as far as we know, there are no real-world 'end user' friendly tools for creating RDF Data Cube data structure definitions.
There is an effort to make a user friendly editor at https://github.com/LOS-ESSnet/DSD-Editor, but this is rather showing the concept than something ready to use.
- Table2qb (see below) offers a conversion starting from predefined table structures.
the most flexible way is to use a dedicated RDF editor such as TopBraid Composer (also available in a free version) and having the RDF Data Cube Vocabulary loaded together with all supporting vocabularies and codelists (existing or to be created).
but this can hardly be called user friendly.
Tools to create and manage code lists
There are several collaborative tools available for creating, managing and publishing code lists, thesauri and authority resources.
We specifically refer to Vocbench 3, which has been funded by the European Commission ISA² programme
- Table2qb (see below) includes facilities to generate SKOS code lists starting from dedicated csv tables.
Tools to generate observations
The input
Consider the following structured CSV file for Olympic medals. This file, which we’ve named input.csv, is an extract taken from a much larger CSV data set which we reduced and aggregated (using Excel and some pattern replacements in a text editor) to only contain the data that we require for our example. It contains data for the 2004, 2008, and 2012 Olympics, and the number of medals of each type won by athletes from China, Great Britain, and the USA
A subset shown:
Competition,Edition,NOC,Gender,Medal,Value Olympics,2004,CHN,Male,Bronze,5 Olympics,2004,CHN,Male,Gold,16 Olympics,2004,CHN,Male,Silver,9 Olympics,2004,CHN,Female,Bronze,10 Olympics,2004,CHN,Female,Gold,36 Olympics,2004,CHN,Female,Silver,18 Olympics,2004,GBR,Male,Bronze,8 Olympics,2004,GBR,Male,Gold,12 Olympics,2004,GBR,Male,Silver,15 Olympics,2004,GBR,Female,Bronze,7 Olympics,2004,GBR,Female,Gold,5 Olympics,2004,GBR,Female,Silver,10 Olympics,2004,USA,Male,Bronze,33 Olympics,2004,USA,Male,Gold,51 Olympics,2004,USA,Male,Silver,33 Olympics,2004,USA,Female,Bronze,40 Olympics,2004,USA,Female,Gold,65 Olympics,2004,USA,Female,Silver,42 Olympics,2008,CHN,Male,Bronze,11 Olympics,2008,CHN,Male,Gold,34 Olympics,2008,CHN,Male,Silver,11 Olympics,2008,CHN,Female,Bronze,46 Olympics,2008,CHN,Female,Gold,40 ....
Generating the observations using TARQL
Tarql is a command-line tool for converting CSV files as above to RDF using SPARQL 1.1 syntax. More information can be found at http://tarql.github.io/.
The SPARQL query (olympics.sparql) to generate the triples according to our example is as follows:
prefix owl: <http://www.w3.org/2002/07/owl#> prefix void: <http://rdfs.org/ns/void#> prefix dcterms: <http://purl.org/dc/terms/> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix dcat: <http://www.w3.org/ns/dcat#> prefix sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#> prefix qb: <http://purl.org/linked-data/cube#> prefix skos: <http://www.w3.org/2004/02/skos/core#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix sdmx-concept: <http://purl.org/linked-data/sdmx/2009/concept#> CONSTRUCT {?URI rdf:type qb:Observation ; qb:dataSet <https://example.org/id/datacube/olympics> ; qb:measureType <https://example.org/ns/olympics#numberofmedals> ; sdmx-dimension:refArea ?refArea ; sdmx-dimension:refPeriod ?refPeriod ; sdmx-dimension:sex ?sex ; <https://example.org/ns/olympics#competition> ?competition ; <https://example.org/ns/olympics#medaltype> ?medal ; <https://example.org/ns/olympics#numberofmedals> ?number ; .} FROM <file:input.csv> WHERE { BIND (URI(CONCAT("https://example.org/id/observation/",LCASE(?Edition),"/", LCASE(?NOC),"/",LCASE(?Gender),"/",LCASE(?Medal),"/",LCASE(?MeasureType))) as ?URI) BIND (URI(CONCAT("http://publications.europa.eu/resource/authority/country/",?NOC)) as ?refArea) #BIND (STRDT(STR(?Edition),xsd:gYear) as ?refPeriod) BIND (URI(CONCAT("http://reference.data.gov.uk/id/year/",STR(?Edition))) as ?refPeriod) BIND (URI(CONCAT("http://purl.org/linked-data/sdmx/2009/code#",?Gender)) as ?sex) BIND (URI(CONCAT("https://example.org/id/concept/",LCASE(?Competition))) as ?competition) BIND (URI(CONCAT("https://example.org/id/concept/",LCASE(?Medal),"medal")) as ?medal) BIND (STRDT(STR(?Value),xsd:integer) as ?number) }
To convert our CSV data, we run the following command line instruction:
tarql olympics.sparql > olympics.ttl
Sample output for one observation looks as follows:
<https://example.org/id/observation/2004/chn/sex-m/bronze/count> rdf:type qb:Observation ; qb:dataSet <https://example.org/id/datacube/olympics> ; qb:measureType <https://example.org/ns/olympics#numberofmedals> ; sdmx-dimension:refArea <http://publications.europa.eu/resource/authority/country/CHN> ; sdmx-dimension:refPeriod <http://reference.data.gov.uk/id/year/2004> ; sdmx-dimension:sex <http://purl.org/linked-data/sdmx/2009/code#sex-M> ; <https://example.org/ns/olympics#competition> <https://example.org/id/concept/olympics> ; <https://example.org/ns/olympics#medaltype> <https://example.org/id/concept/bronzemedal> ; <https://example.org/ns/olympics#numberofmedals> 5 .
Generating the observations with table2qb
Table2qb (pronounced “table to cube”) is a tool that can be used to convert structured CSV data into RDF data cubes. It is aimed at users who understand statistical data and are comfortable with common data processing tools, but it does not require programming skills or detailed knowledge of RDF.
We have a fully worked out example using the olympics dataset.