-
Notifications
You must be signed in to change notification settings - Fork 18
Study GKMConsistencyDesign
This page aim to define GeoKretyMap (GKM) consistency check job: issue#328
GKM Consistency check represent an entity as itself, a brick of service, a microservice, it has it's own life:
- a new dedicated subproject must be created in geokrety organization (avoid mixing this with the website code)
- have read-only access to the GeoKrety database.
- can be written in any language (not necessarily php as the website).
- configured by environment variables
Job configuration is the set of all config/parameters/attributes used as consistency check business logic input:
- (from geokrety database) current
gk-geokrety
table - a job startup trigger (a cron entry config)
- job
config
entries:
config:
-
gkm_api_endpoint
: GeoKretyMap API endpoint -
gkm_export_basic
: GeoKretyMap basic export location (example) -
gkm_consistency_batch_size
: a batch size is geokrety select limit -
gkm_consistency_roll_min_days
: min days limit between rolls
Started by job configuration, the goal of a GKM consistency job is to compare a GKM export with all geokrety table entries.
Job definition:
- a rollId (unique identifier) is defined (cf bellow)
- a cache of GKM data is produced and stored on redis :
- reading an xml basic export from GeoKretyMap
- each geokretymap (geokrety type) entry is stored on redis
- read of geokrety table is done by one or more batches (depend of data and
gkm_consistency_batch_size
(as X)):- a batch start by using current datetime and a selecting X geokrety order by creation date desc.
- following batch will use oldest timestamp from result as max datetime
- this is a end of a roll when a new batch gives no result.
- each batch will compare X geokrety with related redis GKM state
- A new log entry is created each time an unsync geokrety is detected
- at the end of the roll, a new log entry is added with batch result : sum of geokrety analyzed, sum of unsync geokrety
Job throttling
-
rollId
androllEndDate
are stored on redis - no
rollId
and norollEndDate
means that we never had a consistency job in the past -
rollId
value starts from1
the first time and is incremented by one (redis atomic counter) -
rollEndDate
value is-1
when an analysis is in progress -
rollEndDate
value is positive timestamp of the last ended analysis - we could state a new job if and only if (rollId is null) or (rollId is set, and rollEndDate is positive and rollEndDate+gkm_consistency_roll_min_days days < now())
Grafana should include
- view of compared and unsync geokrety counts over the time
The following geokrety informations will be used to compare gk-geokrety
entry with related GKM data:
- id
- name
- ownerName
- distanceTraveledKm
Each produced logs must embed a tag corresponding to the current business logic, so when applicable
- rollId
- geokretyId
- unsync field(s)
We could define a redis entry per compare result
-
gkm_sync_ok_(id)
: value is a timestamp of the last succesfull compare -
gkm_sync_ko_(id)
: value is a map : first_time => first unsuccesfull compare timestamp, last_time => last unsuccesfull compare timestamp, coun t=> number of unsuccesfull compares, reason=> last unsuccesfull compare result
metrics gauges endpoint provide:
-
gkm_sync_ok_*
: number of sync geokrety -
gkm_sync_ko_*
: number of unsync geokrety
We need to design an implement a solution to search over data and/or logs of all geokrety services (application, database, services,...).
Possible candidates are