Getting geonames data
GeoNames is a geographical data base available and accessible through various
Web services, under a
Creative Commons attribution license. This database contains over 8,000,000 geographical names corresponding to over 6,500,000 unique features.
Each GeoNames feature is represented as a Web resource identified by a stable URI, where you can get a RDF description of the feature.
An example of an RDF description of such a geonames "Feature" document,
as obtained through the RDF Webservice at URI
http://sws.geonames.org/6544303/about.rdf
<http://sws.geonames.org/6544303/>
a wgs84_pos:SpatialThing , geo:Feature ;
rdfs:label "Gemeente Skarsterlân" ;
geo:childrenFeatures <http://sws.geonames.org/6544303/contains.rdf> ;
geo:featureClass geo:A ;
geo:featureCode <http://www.geonames.org/ontology#A.ADM2> ;
geo:inCountry <http://www.geonames.org/countries/#NL> ;
geo:locationMap <http://www.geonames.org/6544303/gemeente-skarsterlan.html> ;
geo:name "Gemeente Skarsterlân" ;
wgs84_pos:lat "52.9564979457275" ;
wgs84_pos:long "5.79185485839844" .
As you can see some properties refer to other rdf files, e.g. childrenFeatures refers to "
http://sws.geonames.org/6544303/contains.rdf". Other links depending on applicability refer to
neighbours or
nearby features.
I have a RDF file with all geonames Features in Belgium.
For every Feature in this document I want to retrieve these additional RDF files to populate my triple store further.
The flow is:
- start from the existing RDF file
- find all those Features that have property childrenFeatures and detect the identifier
(subjectnr)
SELECT ?subjectnr
WHERE {
?subject a :Feature .
?subject :childrenFeatures ?children .
LET (?subjectnr := func:subString(func:qname(?subject), 25, 32)) .
}
- for each, retrieve the corresponding contains.rdf
using the template url in which the subjectnr parameter is passed
http://sws.geonames.org/{?subjectnr}/contains.rdf
and the same for those Features having property nearbyFeatures, retrieving the corresponding 'nearby.rdf'
and once again the same for Features having property neighbouringFeatures, retrieving the corresponding 'neighbour.rdf'.

I'm wondering if this pipeline cannot be shortened, since I'm doing in fact three times the same thing.
The only variations are the property tested and the corresponding rdf file.
Need to find some time for refactoring.
Loading dbpedia entries
TopBraid Composer offers you the functionality to query dbpedia to find corresponding dbpedia resources for resources you already have in your model.

Once TopBraid Composer finds a link, a owl:sameAs relationship is established between your resource and the corresponding dbpedia resource.
But we want more.
Based on this relationship we want to retrieve all the dbpedia RDF statements themselves.
This can be done using the same logic as in the Geonames case above.

The only difference is that we needed to 'convert' an idenitifier of a resource to a http adres, which is done in following module.

Pipelines at work again.
Comments