Study GKMConsistencyDesign

This page aim to define GeoKretyMap (GKM) consistency check job: issue#328

A new stateless microservice

GKM Consistency check represent an entity as itself, a brick of service, a microservice, it has it's own life:

a new dedicated subproject must be created in geokrety organization (avoid mixing this with the website code)
have read-only access to the GeoKrety database.
can be written in any language (not necessarily php as the website).
configured by environment variables

Job configuration is the set of all config/parameters/attributes used as consistency check business logic input:

config:

Started by job configuration, the goal of a GKM consistency job is to compare a GKM export with all geokrety table entries.

Job definition:

a rollId (unique identifier) is defined (cf bellow)
a cache of GKM data is produced and stored on redis :
- reading an xml basic export from GeoKretyMap
- each geokretymap (geokrety type) entry is stored on redis
read of geokrety table is done by one or more batches (depend of data and gkm_consistency_batch_size (as X)):
- a batch start by using current datetime and a selecting X geokrety order by creation date desc.
- following batch will use oldest timestamp from result as max datetime
- this is a end of a roll when a new batch gives no result.
each batch will compare X geokrety with related redis GKM state
- A new log entry is created each time an unsync geokrety is detected
at the end of the roll, a new log entry is added with batch result : sum of geokrety analyzed, sum of unsync geokrety

Job throttling

rollId and rollEndDate are stored on redis
no rollId and no rollEndDate means that we never had a consistency job in the past
rollId value starts from 1 the first time and is incremented by one (redis atomic counter)
rollEndDate value is -1 when an analysis is in progress
rollEndDate value is positive timestamp of the last ended analysis
we could state a new job if and only if (rollId is null) or (rollId is set, and rollEndDate is positive and rollEndDate+gkm_consistency_roll_min_days days < now())

Grafana should include

The following geokrety informations will be used to compare gk-geokrety entry with related GKM data:

Each produced logs must embed a tag corresponding to the current business logic, so when applicable

We could define a redis entry per compare result

gkm_sync_ok_(id): value is a timestamp of the last succesfull compare
gkm_sync_ko_(id): value is a map : first_time => first unsuccesfull compare timestamp, last_time => last unsuccesfull compare timestamp, coun t=> number of unsuccesfull compares, reason=> last unsuccesfull compare result

metrics gauges endpoint provide:

We need to design an implement a solution to search over data and/or logs of all geokrety services (application, database, services,...).

Possible candidates are

minio github a high performance object storage server compatible with Amazon S3 APIs
ELK stack github Elasticsearch, Logstash, Kibana stack