Skip to content
nkaralis edited this page Aug 3, 2017 · 3 revisions

GeoTriples

Publishing geospatial data as Linked Open Geospatial Data

Quickstart##

Use GeoTriples binaries (Unix)

Assuming Java >=7 installed:

Download GeoTriples binaries at: http://geotriples.di.uoa.gr/downloads/geotriples-1.1.6-bin.zip

  • Unzip the downloaded file geotriples-<version>-bin.zip

  • Change directory to geotriples-<version>-bin

  • Under the bin directory you can find the available starter script for GeoTriples

    bin/geotriples-cmd

Building from sources

Assuming git, [Maven](http://maven.apache.org/download.cgi =25x25) and Java >=7 installed:

$ git clone https://github.com/LinkedEOData/GeoTriples.git
$ cd GeoTriples
$ mvn clean package
$ cd geotriples-core/target
$ java -cp "./dependency-jars/*:geotriples-core-<version>-SNAPSHOT.jar" eu.linkedeodata.geotriples.GeoTriplesCMD [Options] [Argument]
# [In order to use the OBDA mode, download the latest version of ontop-spatial (https://github.com/ConstantB/ontop-spatial) and add its dependencies to the classpath. ]

# [Optional: Add an alias for executing the jar file with command `geotriples-cmd`]
$ echo "alias geotriples-cmd='java -cp \"<GEOTRIPLES_HOME>/geotriples-core/target/dependency-jars/*:<GEOTRIPLES_HOME>/geotriples-core/target/geotriples-core-<version>-SNAPSHOT.jar\" eu.linkedeodata.geotriples.GeoTriplesCMD'" >> ~/.bashrc

Goal

GeoTriples is a tool for transforming geospatial data from their original formats (e.g., shapefiles or spatially-enabled relational databases) into RDF. The following input formats are supported: spatially-enabled relational databases (PostGIS and MonetDB), ESRI shapefiles and XML, GML, KML, JSON, GeoJSON and CSV documents.

Support for R2RML and RML mappings

GeoTriples supports the mapping languages R2RML and RML and extends them for modeling the transformation of geospatial data into RDF graphs.

GeoTriples Architecture

GeoTriples comprises two main components: the mapping generator and the R2RML/RML mapping processor. The mapping generator takes as input a geospatial data source (e.g., a shapefile) and creates automatically an R2RML or RML mapping that can transform the input into an RDF graph which uses the GeoSPARQL vocabulary. The user may edit the generated R2RML/RML mapping document to comply with her requirements (e.g., use a vocabulary different than the one of GeoSPARQL). Then, the mapping processor executes the R2RML/RML mappings to produce the output geospatial RDF graph. The mapping processor of GeoTriples comes in two forms: a single-node implementation and an implementation that uses Apache Hadoop for dealing with big geospatial data. The second implementation can be found here.

It is often the case in applications that relevant geospatial data is stored in spatially-enabled relational databases (e.g., PostGIS) or files (e.g., shapefiles), and its owners do not want to explicitly transform it into linked data. For example, this might be because these data sources get frequently updated and/or are very large. If this is the case, GeoTriples is still very useful. GeoTriples users can use the generated mappings in the system Ontop-spatial to view their data sources virtually as linked data. Ontop-spatial is a geospatial Ontology-Based Data Access system which performs on-the-fly GeoSPARQL-to-SQL translation over spatially-enabled relational databases using ontologies and mappings.

Architecture

Automatic generation of R2RML/RML mappings

  • Relational Database
$ geotriples-cmd generate_mapping -b baseURI [-u user] [-p password] [-d driver] [-o mappingFile] [-rml] jdbcURL
  • Shapefile
$ geotriples-cmd generate_mapping -b baseURI [-o mappingFile] [-rml] fileURL
  • XML files (Only RML mappings)
$ geotriples-cmd generate_mapping -b baseURI [-o RMLmappingFile] [-rp rootpath] [-r rootelement] [-onlyns namespace] [-ns namespaces] [-x XSDfile] fileURL

Tip When generating a mapping consider the extension of the input file; GeoTriples consults the extension to interpret/process the data. For example, input.csv should be a comma separated value file. If you would like to process a tab separated file input.csv, then consider renaming to input.tsv


Transformation into RDF

  • Relational Database
$ geotriples-cmd dump_rdf [-rml] [-f format] [-b baseURI] [-o rdfoutfile]  -u user -p password -d driver -j jdbcURL inputmappingfile
  • Shapefile
$ geotriples-cmd dump_rdf [-rml] [-f format] [-b baseURI] [-o rdfoutfile] [-s epsgcode] [-sh fileURL] inputmappingfile
  • XML/JSON (using RML processor)
$ geotriples-cmd dump_rdf  -rml [-f format] [-b baseURI] [-o rdfoutfile] [-s epsgcode] inputRMLmappingfile

RML Processor

GeoTriples also supports an extended version of RML mapping language by extending the RML processor to address the spatial information. RML is defined as a superset language of R2RML. The strong point of RML, is that is designed to allow the process of data that do not necessarily rely in tables and thus not having an explicit iteration pattern.

For example, the farms.xml (see below) cannot be iterated in per row fashion, because it has nested elements.

<Farm>
   <Field id="1">
      <Vigor>4</Vigor>
      <Farmer>John Vl</Farmer>
      <Geometry>
       <gml:Polygon>
         <gml:outerBoundaryIs>
           <gml:LinearRing> 
             <gml:posList>0,0 100,0 100,100 0,100 0,0</gml:posList> 
           </gml:LinearRing>
         </gml:outerBoundaryIs>
       </gml:Polygon>
      </Geometry>
   </Field>
   <Field id="2">
      <Vigor>1</Vigor>
      <Farmer>Harper Lee</Farmer>
      <Geometry id=1>
       <gml:Polygon>
         <gml:outerBoundaryIs>
           <gml:LinearRing> 
             <gml:posList>100,100 200,100 200,200 100,200 100,100</gml:posList>
           </gml:LinearRing>
         </gml:outerBoundaryIs>
       </gml:Polygon>
      </Geometry>
   </Field>
   <Field id="3">
      <Vigor>3</Vigor>
      <Farmer>Bruce Pom</Farmer>
   </Field>
</Farm>

R2RML uses the property rr:tableName to define which table from the input file or the relational database it going to be used as the source table for the mappings. RML has the equivalent rml:source to define the source for the mappings. The source can be a JDBC URL for a relational database, a Shapefile, an XML, JSON or CSV file. The iterator property rml:iterator defines the iterating pattern in order to process non-relational structured files. For the above example the iterator should be an XPath query.

An example RML mapping can be the following

``` <#Field> rml:logicalSource [ rml:source "/fields.xml"; rml:referenceFormulation ql:XPath; rml:iterator "/Farm/Field"];
rr:subjectMap [ 
 rr:class ont:Farm; 
 rr:class ogc:Feature;
 rr:template "http://data.linkedeodata.eu/Field/id/{@id}"];

rr:predicateObjectMap [ 
 rr:predicate ont:hasVigor; 
 rr:objectMap [
   rml:reference "Vigor"]];

rr:predicateObjectMap [ 
 rr:predicate ont:hasFarmer;
 rr:objectMap [ 
   rml:reference "Farmer"]].
   
rr:predicateObjectMap [ 
 rr:predicate ogc:hasGeometry;
 rr:objectMap [ 
   rr:template "http://data.linkedeodata.eu/FieldGeometry/id/{Geometry/@id}"]].

<#FieldGeometry> rml:logicalSource [ rml:source "/fields.xml"; rml:referenceFormulation ql:XPath; rml:iterator "/Farm/Field/Geometry"];

rr:subjectMap [ 
 rr:class ont:FieldGeometry; 
 rr:class ogc:Geometry
 rr:template "http://data.linkedeodata.eu/FieldGeometry/id/{@id}"];

rr:predicateObjectMap [ 
 rr:predicate ogc:dimension; 
 rr:objectMap [
   rrx:function rrxf:dimension;
   rrx:argumentMap ([rml:reference "*"]) ];

rr:predicateObjectMap [ 
 rr:predicate ogc:asWKT; 
 rr:objectMap [
   rrx:function rrxf:asWKT;
   rrx:argumentMap ([rml:reference "*"]) ].

This mapping contains two triples maps: <#Field> and <#FieldGeometry>. Both triples maps uses an XPath iterator, denoted by `rml:referenceFormulation`, as the base iterator pattern that will be used by the mapping processor module for the generation of the graph. The `rml:reference` is used instead of `rr:column` R2RML's property . The value of `rml:reference` property extends the iterator in order to point at an element.

## Combine heterogeneous data, extract topological relations ##
RML can be used to combine heterogeneous data by generating links between resources that share a same attribute.

For example, if you have a Shapefile that contains a field named A, and this field is being used as an `reference key` B to a JSON file, then you can use the RML join conditions to generate links between these two datasets.

GeoTriples implements an extended version of Join Condition class allowing for the generation of links that are not depending on equality of two values, but on the result of a function. Currently, GeoTriples implements the following GeoSPARQL functions:
<ol>
  <li>sfIntersects</li>
  <li>sfContains</li>
  <li>sfTouches</li>
</ol>