Skip to content

WMStats proposal

Valentin Kuznetsov edited this page Jan 9, 2023 · 2 revisions

This wiki collects set of actions and thoughts about WMStats re-design proposal.

The WMStats will consist of two independent services:

  • WMStats cache server which will communicate with CouchDB (primary source of information)
    • it is responsible for serving data from CouchDB to other clients
  • WMStats UI server to represent WMStats data to end-users. This service should be lightweight and very responsive to user queries, filters, etc.

To accomplish this model we need the following set of actions:

  • ensure that data comes from WMStats cache server in appropriate data-format

    • we need to use flat schema with static keys
    • we need ability to fetch data in chunks, a.k.a. pagination
    • we need data-streaming, therefore it is desired to support both application/json and application/ndjson data-format
    • we need to apply gzip encoding for HTTP request between server and client

    For more details please refer to appropriate section of WMStats details document

  • re-evaluate data that might not be needed

  • enforce static schema

  • provide various benchmarks between different data representation (dynamic vs static schema, find CPU, RAM and how to scale)

  • we need to outline RESTful APIs

    • /fetch?idx=1&limit=10 to provide stream of data
    • /fetch/<workflow> to provide details of single workflow
    • /sites/<site> to provide site information
    • /campaign/<campaign> to provide campaign information
    • /agent/<agent> to provide WMAgent information
    • /release/<release> to provide release information
  • to proceed transition between current and new implementation we need to provide mocking data using static schema

    • should be able to generate necessary set of records with dummy content but proper data-types
  • we need WMStats UI server to adapt new static schema

  • we need to decide on technology, language, etc.

    • which programming language to use for WMStats implementation, e.g. current python server, Go implementation, any other one
    • which database layer to use
    • which CSS/JS frameworks to use, e.g. kube CSS framework (used in ReqMgr2, WMArchive, etc)
  • we need to find producer/consumers of this data

  • WMStats UI server should provide flexible filters (dynamically generated at run-time or static pre-process)

The current WMStats server stores documents using dynamic keys, see example. We suggest to convert this record (and remove unnecessary keys/values) using static schema, e.g.

[
   {"workflow": "bla-bla',
    "nevents:1,
   },
]

where all keys will be pre-define, using CamelCase naming convention, and all values will have well defined data-types.

Clone this wiki locally