-
Notifications
You must be signed in to change notification settings - Fork 18
Study GKMConsistencyDesign
This page aim to define GeoKretyMap (GKM) consistency check job: issue#328
GKM Consistency check represent an entity as itself, a brick of service, a microservice, it has it's own life.
A new dedicated subproject must be created in geokrety organization to avoid mixing this with the website code.
This service must have read-only access to the GeoKrety database. It can be written in any language, not necessarily php as the website.
A sync state object defines the current consistency check status:
- it could be null (very first time), else it could be stored on redis,
- this data is not production sensitive, because if we loose it, that's not a problem, because we start over from the beginning
SyncState
attributes:
-
rollId
: current roll identifier, from 10_000 to 19_999. A rollId is incremented by 1 when a roll is finished. If an incremented rollId is out of range, then min value must be used. -
timestamp
: define the current geokrety offset to use (geokrety max creation date). When a batch select gives 0 result with timestamp as max creation date, then the roll is finished, timestamp is set to null, and we need to restart from scratch (last created geokrety from now as first batch). -
geokrety_count
: number of analyzed geokrety -
unsync_geokrey_count
: number of unsync geokrety - (roll output hashtag is generated by configuration + rollId)
Sync parameters is defined as the set of all config/parameters/attributes used as consistency check business logic input.
SyncParameters:
- (from geokrety database) current
gk-geokrety
table - a job startup trigger (a cron entry config)
- job
config
entries:
config:
-
gkm_consistency_max_duration_sec
: a job max duration in seconds -
gkm_consistency_max_batch_size
: a batch size is geokrety select limit -
gkm_consistency_roll_min_days
: min days limit between rolls -
gkm_api_endpoint
: GeoKretyMap API endpoint
Started by cron configuration, the goal of a job is to start (or continue) and try to finish a roll in the limit of max duration. Generally more than one job could be necessary to end a roll. This way we can spread the load over several days for a roll.
- If a roll couldn't be finished in the limit, a SyncState is produced and stored for the next job.
- If a job is triggered with an ended roll as SyncState, a new roll is started if and only if the roll min interval day is reached. rollId is incremented.
A roll is defined by the check of all geokrety table entries.
This check is done by one or more batches (depend of data and gkm_consistency_max_batch_size
(as X)):
- We start by using current datetime and a selecting X geokrety order by creation date desc.
- X is max batch size,
- A batch will compare this X geokrety with remote GKM state using GKM API (cf dedicated section),
When a batch is finished, then timestamp
is set to geokrety min creationdate. timestamp
is then used as max timestamp for the next batch iteration.
- a new batch could start if and only if the job max duration seconds is not reached, else the next batch will be done by the next job.
We considers a end of a roll when a new batch gives no result. We need to store the roll history and null as timestamp
Admin page should inlude:
- N links to the last roll output (stored in an external service)
Grafana should include
- view of unsync geokrety count over the time
TODO /ToBeDefined/ how to measure geokrety that are continuously unsync
The following geokrety informations will be used to compare gk-geokrety
entry with related GKM API call:
- id
- dateMoved
- ownerName
- ownerId
- distanceTraveledKm
- waypointCode
- state
- typeId
- positionLat
- positionLon
- imageSrc
- name
- lastMoveId
( need to progress a little on analysis to complete this section)
As this service could be run on multiple instance, service must push logs to minio. (more details to come)
Each produced logs must embed a tag corresponding to the current business logic, so when applicable
- rollId
- geokretyId
- state