Skip to content

Study GKMConsistencyDesign

Vandeputte Brice edited this page May 20, 2019 · 14 revisions

This page aim to define GeoKretyMap consistency check job: issue#328

Definitions

SynState

A sync state object defines the current consistency check status.

It could be null (very first time), else it is stored in temporary cache file on disk.

SyncState attributes:

  • rollId : current roll identifier, from 10_000 to 19_999. A rollId is incremented by 1 when a roll is finishehd. If an incremented rollId is out of range, then min value must be used.
  • timestamp : define the current geokrety offset to use (geokrety max creation date). When a batch select gives 0 result with timestamp as max creation date, then the roll is finished, timestamp is set to null, and we need to restart from scratch (last created geokrety from now as first batch).
  • geokrety_count : number of analyzed geokrety
  • unsync_geokrey_count : number of unsync geokrety
  • (roll output directory is generated by configuration + rollId)
  • array of finished_roll history

finished_roll attributes:

  • rollId: roll identifier
  • geokrety_count
  • unsync_geokrey_count
  • finished_timestamp: timestamp value of the last finished roll.
  • (roll output directory is generated by configuration + rollId)

SyncParameters

Sync parameters is defined as the set of all config/parameters/attributes used as consistency check business logic input.

SyncParameters:

  • current gk-geokrety table
  • a job startup trigger (a cron entry config)
  • konfig entries:

konfig:

  • gkm_consistency_max_duration_sec : a job max duration in seconds
  • gkm_consistency_max_batch_size: a batch size is geokrety select limit
  • gkm_consistency_roll_min_days: min days limit between rolls
  • gkm_consistency_roll_number_to_keep: number of roll history to keep
  • gkm_consistency_diretory: an output log directory
  • gkm_api_endpoint: GeoKretyMap API endpoint

Design of a job

Started by cron configuration, the goal of a job is to start and try to finish a roll in the limit of max duration. But generally more than one job could be necessary to end a roll. This way we can spread the load over several days for a roll.

  • If a roll couldn't be finished in the limit, a SyncState is produced and stored for the next job.
  • If a job is triggered with an ended roll as SyncState, a new roll is started if and only if the roll min interval day is reached. rollId is incremented.

Design of a roll and batches

A roll is defined by the check of all geokrety table entries. This check is done by one or more batches (depend of X):

  • We start by using current datetime and a selecting X geokrety order by creation date desc.
  • X is max batch size,
  • A batch will compare this X geokrety with remote GKM state using GKM API (cf dedicated section),

When a batch is finished, then timestampis set to geokrety min creationdate (to be used as max timestamp for the next batch iteration).

  • a new batch could start if and only if the job max duration seconds is not reached.

End of a roll

We considers a end of a roll when a new batch gives no result. We need to store the roll history and null as timestamp

Design of admin page: new sync section

Admin page should inlude:

  • array of last Y roll
  • for each roll: state, number of handled geokrety, number of unsync geokrety with afriendly reminder to the output rollId directory log.

Compage Geokrety with GKM entries

The following geokrety information will be used to compare gk-geokrety entry with related GKM API call:

  • id
  • dateMoved
  • ownerName
  • ownerId
  • distanceTraveledKm
  • waypointCode
  • state
  • typeId
  • positionLat
  • positionLon
  • imageSrc
  • name
  • lastMoveId
Clone this wiki locally