Loading Linked Data

Getting geonames data


GeoNames is a geographical data base available and accessible through various Web services, under a Creative Commons attribution license. This database contains over 8,000,000 geographical names corresponding to over 6,500,000 unique features.
Each GeoNames feature is represented as a Web resource identified by a stable URI, where you can get a RDF description of the feature.

An example of an RDF description of such a geonames "Feature" document,
as obtained through the RDF Webservice at URI http://sws.geonames.org/6544303/about.rdf

<http://sws.geonames.org/6544303/>
a wgs84_pos:SpatialThing , geo:Feature ;
rdfs:label "Gemeente Skarsterlân" ;
geo:childrenFeatures <http://sws.geonames.org/6544303/contains.rdf> ;
geo:featureClass geo:A ;
geo:featureCode <http://www.geonames.org/ontology#A.ADM2> ;
geo:inCountry <http://www.geonames.org/countries/#NL> ;
geo:locationMap <http://www.geonames.org/6544303/gemeente-skarsterlan.html> ;
geo:name "Gemeente Skarsterlân" ;
wgs84_pos:lat "52.9564979457275" ;
wgs84_pos:long "5.79185485839844" .
As you can see some properties refer to other rdf files, e.g. childrenFeatures refers to "http://sws.geonames.org/6544303/contains.rdf".
Other links depending on applicability refer to neighbours or nearby features.

I have a RDF file with all geonames Features in Belgium.
For every Feature in this document I want to retrieve these additional RDF files to populate my triple store further.

Using TopBraid's SPARQLMotion


The flow is:
  1. start from the existing RDF file
  2. find all those Features that have property childrenFeatures and detect the identifier (subjectnr)
    SELECT ?subjectnr
    WHERE {
    ?subject a :Feature .
    ?subject :childrenFeatures ?children .
    LET (?subjectnr := func:subString(func:qname(?subject), 25, 32)) .
    }
  3. for each, retrieve the corresponding contains.rdf
    using the template url in which the subjectnr parameter is passed
    http://sws.geonames.org/{?subjectnr}/contains.rdf
and the same for those Features having property nearbyFeatures, retrieving the corresponding 'nearby.rdf'
and once again the same for Features having property neighbouringFeatures, retrieving the corresponding 'neighbour.rdf'.

Geonames pipeline
I'm wondering if this pipeline cannot be shortened, since I'm doing in fact three times the same thing.
The only variations are the property tested and the corresponding rdf file.

Need to find some time for refactoring.

Loading dbpedia entries


TopBraid Composer offers you the functionality to query dbpedia to find corresponding dbpedia resources for resources you already have in your model.


dbpedia mapping
Once TopBraid Composer finds a link, a owl:sameAs relationship is established between your resource and the corresponding dbpedia resource.

But we want more.
Based on this relationship we want to retrieve all the dbpedia RDF statements themselves.
This can be done using the same logic as in the Geonames case above.

dbpedia loading
The only difference is that we needed to 'convert' an idenitifier of a resource to a http adres, which is done in following module.

convert the resource identifier into a dereferenceable URL
Pipelines at work again.

Comments