Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/development'
Browse files Browse the repository at this point in the history
  • Loading branch information
ghsnd committed Jun 2, 2023
2 parents 9d52ee7 + 4674b0a commit d83525c
Show file tree
Hide file tree
Showing 221 changed files with 4,048 additions and 433 deletions.
13 changes: 12 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
# * Uses site:stage to collect the documentation for multi-module projects.
# * Publishes the documentation for `master` branch.


services:
- name: docker:dind
command: ["--tls=false"]

variables:
# This will suppress any download for dependencies and plugins or upload messages which would clutter the console log.
# `showDateTime` will show the passed time in milliseconds. You need to specify `--batch-mode` to make this work.
Expand All @@ -23,14 +28,20 @@ variables:
# when running from the command line.
# `installAtEnd` and `deployAtEnd` are only effective with recent version of the corresponding plugins.
MAVEN_CLI_OPTS: "--batch-mode --errors --fail-at-end --show-version -DinstallAtEnd=false -DdeployAtEnd=false"
# Instruct Testcontainers to use the daemon of DinD, use port 2735 for non-tls connections.
DOCKER_HOST: "tcp://docker:2375"
# Instruct Docker not to start over TLS.
DOCKER_TLS_CERTDIR: ""
# Improve performance with overlayfs.
DOCKER_DRIVER: overlay2

# Cache downloaded dependencies and plugins between builds.
# To keep cache across branches add 'key: "$CI_JOB_NAME"'
cache:
paths:
- .m2/repository

# This will only the project.
# This will only the project.
.build: &build
stage: build
script:
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,21 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.5.0] - 2023-06-02

### Added
* Support for relational databases using JDBC.
* Parameter `parallelism` setting the maximum number of parallel operations.
* Script `change-version.sh` to update the version of RMLStreamer in required files.

### Fixed
* Updated Function Agent to v1.1.0
* Updated GREL Functions to v0.9.1
* Updated IDLab Functions to v0.2.0
* Use `<maven.compiler.release>` property in `pom.xml` to set Java version to 11.
* Allow a relative path (to the working dir) as output directory when writing to file.
* Fixed bug in extracting namespaces from XML element (internal [issue #161](https://gitlab.ilabt.imec.be/rml/proc/rml-streamer/-/issues/161))

## [2.4.2] - 2022-10-10

### Fixed
Expand Down Expand Up @@ -214,3 +229,4 @@ can be set with the program argument `--baseIRI`.
[2.4.0]: https://github.com/RMLio/RMLStreamer/compare/v2.3.0...v2.4.0
[2.4.1]: https://github.com/RMLio/RMLStreamer/compare/v2.4.0...v2.4.1
[2.4.2]: https://github.com/RMLio/RMLStreamer/compare/v2.4.1...v2.4.2
[2.5.0]: https://github.com/RMLio/RMLStreamer/compare/v2.4.2...v2.5.0
68 changes: 64 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,24 @@ If you go to the directory where your data and mappings are,
you can run something like (change tag to appropriate version):

```
$ docker run -v $PWD:/data --rm rmlstreamer:2.4.1 toFile -m /data/mapping.ttl -o /data/output.ttl
$ docker run -v $PWD:/data --rm rmlstreamer:v2.5.0 toFile -m /data/mapping.ttl -o /data/output.ttl
```

There are more options for the script, if you want to use specific tags or push to Docker Hub:
```
$ ./buildDocker.sh -h
Build and push Docker images for RMLStreamer
buildDocker.sh [-h]
buildDocker.sh [-a][-n][-p][-u <username>][-v <version>]
options:
-a Build for platforms linux/arm64 and linux/amd64. Default: perform a standard 'docker build'
-h Print this help and exit.
-n Do NOT (re)build RMLStreamer before building the Docker image. This is risky because the Docker build needs a stand-alone version of RMLStreamer.
-u <username> Add an username name to the tag name as on Docker Hub, like <username>/rmlstreamer:<version>.
-p Push to Docker Hub repo. You must be logged in for this to succeed.
-v <version> Override the version in the tag name, like <username>/rmlstreamer:<version>. If not given, use the current version found in pom.xml.
```

### Moderately quick start (Docker - the recommended way)
Expand Down Expand Up @@ -98,6 +115,9 @@ The resulting `RMLStreamer-<version>.jar`, found in the `target` folder, can be
$ mvn clean package -DskipTests -P 'stand-alone'
```

**Note**: If you want to update the version of RMLStreamer (e.g. when developing or releasing), run the script
`change-version.sh <your-new-version>`. It updates the version on relevant places in the repository.

### Executing RML Mappings

*This section assumes the use of a CLI. If you want to use Flink's web interface, check out
Expand Down Expand Up @@ -134,22 +154,26 @@ $FLINK_BIN run <path to RMLStreamer jar> toKafka --broker-list <host:port> --top
#### Complete RMLStreamer usage:

```
Usage: RMLStreamer [toFile|toKafka|toTCPSocket|noOutput] [options]
Usage: RMLStreamer [toFile|toKafka|toTCPSocket|toMQTT|noOutput] [options]
-f, --function-descriptions <function description location 1>,<function description location 2>...
An optional list of paths to function description files (in RDF using FnO). A path can be a file location or a URL.
-j, --job-name <job name>
The name to assign to the job on the Flink cluster. Put some semantics in here ;)
-i, --base-iri <base IRI>
The base IRI as defined in the R2RML spec.
--disable-local-parallel
By default input records are spread over the available task slots within a task manager to optimise parallel processing,at the cost of losing the order of the records throughout the process. This option disables this behaviour to guarantee that the output order is the same as the input order.
-p, --parallelism <task slots>
Sets the maximum operator parallelism (~nr of task slots used)
-m, --mapping-file <RML mapping file>
REQUIRED. The path to an RML mapping file. The path must be accessible on the Flink cluster.
--json-ld Write the output as JSON-LD instead of N-Quads. An object contains all RDF generated from one input record. Note: this is slower than using the default N-Quads format.
--bulk Write all triples generated from one input record at once, instead of writing triples the moment they are generated.
--checkpoint-interval <time (ms)>
If given, Flink's checkpointing is enabled with the given interval. If not given, checkpointing is enabled when writing to a file (this is required to use the flink StreamingFileSink). Otherwise, checkpointing is disabled.
--auto-watermark-interval <time (ms)>
If given, Flink's watermarking will be generated periodically with the given interval. If not given, a default value of 50ms will be used.This option is only valid for DataStreams.
-f, --function-descriptions <function description location 1>,<function description location 2>...
An optional list of paths to function description files (in RDF using FnO). A path can be a file location or a URL.
Command: toFile [options]
Write output to file
Note: when the mapping consists only of stream triple maps, a StreamingFileSink is used. This sink will write the output to a part file at every checkpoint.
Expand All @@ -166,6 +190,15 @@ Command: toTCPSocket [options]
Write output to a TCP socket
-s, --output-socket <host:port>
The TCP socket to write to.
Command: toMQTT [options]
Write output to an MQTT topic
-b, --broker <host:port>
The MQTT broker.
-t, --topic <topic name>
The name of the MQTT topic to write output to.
Command: noOutput
Do everything, but discard output
```

#### Examples
Expand Down Expand Up @@ -282,6 +315,33 @@ The only option for spreading load is to use multiple topics, and assign one RML
] .
```

##### Generating a stream from a relational database
RMLStreamer supports relational databases as a logical source. JDBC is used to establish a connection and perform a query against a database. See example mapping below.
```ttl
<TriplesMap1>
a rr:TriplesMap;
rml:logicalSource [
rml:source <#DB_source>;
rr:sqlVersion rr:SQL2008;
rr:tableName "country_info";
];
rr:subjectMap [ rr:template "http://example.com/{Country Code}/{Name}" ];
rr:predicateObjectMap [
rr:predicate ex:name ;
rr:objectMap [ rml:reference "Name" ]
] .
<#DB_source> a d2rq:Database;
d2rq:jdbcDSN "CONNECTIONDSN";
d2rq:jdbcDriver "org.postgresql.Driver";
d2rq:username "postgres";
d2rq:password "" .
```

#### RML Stream Vocabulary (non-normative)

Expand Down
88 changes: 83 additions & 5 deletions buildDocker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,88 @@

version=$(grep -F '<version>' pom.xml | head -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1)

### 1. Build RMLStreamer stand-alone
echo "Building stand-alone RMLStreamer version $version"
mvn clean package -DskipTests -P 'stand-alone'
help() {
echo "Build and push Docker images for RMLStreamer"
echo
echo "buildDocker.sh [-h]"
echo "buildDocker.sh [-a][-n][-p][-u <username>][-v <version>]"
echo "options:"
echo "-a Build for platforms linux/arm64 and linux/amd64. Default: perform a standard 'docker build'"
echo "-h Print this help and exit."
echo "-n Do NOT (re)build RMLStreamer before building the Docker image. This is risky because the Docker build needs a stand-alone version of RMLStreamer."
echo "-u <username> Add an username name to the tag name as on Docker Hub, like <username>/rmlstreamer:<version>."
echo "-p Push to Docker Hub repo. You must be logged in for this to succeed."
echo "-v <version> Override the version in the tag name, like <username>/rmlstreamer:<version>. If not given, use the current version found in pom.xml."
}

do_not_build=false
build_for_all=false
push=false


while getopts ahnu:pv: option
do
case "${option}" in
a) # Build for all available OS/arcghitecture of base image
build_for_all=true;;
h) # dislplay help
help
exit;;
n) # Do NOT (re)build RMLStreamer
do_not_build=true;;
u) # Override username
username=${OPTARG};;
p) # Push to Docker Hub
push=true;;
v) # Override version
version=${OPTARG};;
\?) # Invalid option
echo "Error: invalid option"
exit;;
esac
done

if [[ -z "$version" ]]
then
version=$(grep -F '<version>' pom.xml | head -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1)
fi

if [[ -n "$username" ]]
then
username="${username}/"
fi

tag="${username}rmlstreamer:${version}"
tag_latest="${username}rmlstreamer:latest"
echo "tag: $tag"

if ! $do_not_build
then
### 1. Build RMLStreamer stand-alone
echo "Building stand-alone RMLStreamer"
mvn clean package -DskipTests -P 'stand-alone'
fi


### 2. Build the docker container
docker build --tag "rmlstreamer:$version" . && \
echo "Successfully built rmlstreamer:$version !"
echo "Building Docker image(s)"
if $build_for_all
then
platforms="linux/arm64,linux/amd64"
if $push
then
DOCKER_BUILDX_BUILD_ARGS=(--platform $platforms --tag $tag --tag $tag_latest --push .)
else
DOCKER_BUILDX_BUILD_ARGS=(--platform $platforms --tag $tag --tag $tag_latest .)
fi
docker buildx create --name rmlstreamerallplatforms --use
docker buildx build ${DOCKER_BUILDX_BUILD_ARGS[@]}
docker buildx rm rmlstreamerallplatforms
else
docker build --tag $tag --tag $tag_latest .
if $push
then
docker push $tag
docker push $tag_latest
fi
fi
19 changes: 19 additions & 0 deletions change-version.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env bash

VERSION=$1

# exit if no version argument given
if [ ! "$VERSION" ]
then
echo 'No version given. Exiting'.
exit
fi

echo 'Updating pom.xml'
sed -i "s:^ <version>\(.*\)</version>: <version>$VERSION</version>:" pom.xml

echo 'Updating ParameterUtil.scala'
sed -i "s:head(\"RMLStreamer\", \"\(.*\)\"):head(\"RMLStreamer\", \"$VERSION\"):" src/main/scala/io/rml/framework/core/util/ParameterUtil.scala

echo 'Updating README.md'
sed -i "s/rmlstreamer:\(2.4.1\) toFile/rmlstreamer:$VERSION toFile/" README.md
65 changes: 59 additions & 6 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ SOFTWARE.

<groupId>io.rml</groupId>
<artifactId>RMLStreamer</artifactId>
<version>2.4.2</version>
<version>v2.5.0</version>
<packaging>jar</packaging>

<name>RMLStreamer</name>
Expand All @@ -41,9 +41,10 @@ SOFTWARE.
<log4j.version>2.17.0</log4j.version>
<jena.version>4.3.1</jena.version>
<kafka.version>2.4.1</kafka.version>

<junit.version>5.9.1</junit.version>
<testcontainers.version>1.17.6</testcontainers.version>
<scala.binary.version>2.11</scala.binary.version>
<java.version>11</java.version>
<maven.compiler.release>11</maven.compiler.release>

<!-- license properties -->
<project.inceptionYear>2018</project.inceptionYear>
Expand Down Expand Up @@ -387,7 +388,7 @@ SOFTWARE.
<dependency>
<groupId>com.github.FnOio</groupId>
<artifactId>function-agent-java</artifactId>
<version>v0.2.1</version>
<version>v1.1.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.jena</groupId>
Expand All @@ -398,14 +399,66 @@ SOFTWARE.
<dependency>
<groupId>com.github.fnoio</groupId>
<artifactId>grel-functions-java</artifactId>
<version>v0.8.2</version>
<version>v0.9.1</version>
</dependency>
<dependency>
<groupId>com.github.fnoio</groupId>
<artifactId>idlab-functions-java</artifactId>
<version>v0.1.2</version>
<version>v0.2.0</version>
</dependency>
<dependency>
<groupId>com.github.RMLio</groupId>
<artifactId>dataio</artifactId>
<version>v0.1.1</version>
</dependency>

<!-- Testcontainers dependencies -->
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers</artifactId>
<version>${testcontainers.version}</version>
<scope>test</scope>
</dependency>

<!-- containers -->
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>postgresql</artifactId>
<version>${testcontainers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>jdbc</artifactId>
<version>${testcontainers.version}</version>
<scope>test</scope>
</dependency>
<!-- JUnit extension -->
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${testcontainers.version}</version>
<scope>test</scope>
</dependency>

<!-- End of Testcontainers dependencies -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.5.2</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
</dependencies>

<!-- This profile helps to make things run stand-alone or in IntelliJ -->
Expand Down
Loading

0 comments on commit d83525c

Please sign in to comment.