Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dotfile to root of revision as standard #58

Open
perolavsvendsen opened this issue Nov 15, 2023 · 3 comments
Open

Add dotfile to root of revision as standard #58

perolavsvendsen opened this issue Nov 15, 2023 · 3 comments

Comments

@perolavsvendsen
Copy link
Member

perolavsvendsen commented Nov 15, 2023

Based on discussions 15. nov 2023

Currently, some masterdata are stored in global_variables.yml. These are technically metadata about the model, and contain mostly identifiers to entities in SMDA, including country, field, discovery, coordinate_reference_system and stratigraphic_column. These elements do not change frequently, and may not natively belong in global_variables.yml.

When pushing data to Sumo, it is also required that the model metadata contains a reference to the asset which in turn is mapped to the term asset in Sumo - used for access control, storage choices, etc. This is also currently included in global_variables.yml.

There is a need for more consistent storage of configuration of static nature. This proposal is to move such elements out of global_variables.yml to another more suitable location.

Proposal:
On the root of (each) FMU revision, we place a dotfile called .fmu. The contents of this file can be expanded as needed, but a first iteration content can be:

masterdata:
  smda:
    country:
      - identifier: Norway
        uuid: ad214d85-8a1d-19da-e053-c918a4889309
    discovery:
      - short_identifier: DROGON
        uuid: ad214d85-8a1d-19da-e053-c918a4889309
    field:
      - identifier: DROGON
        uuid: 00000000-0000-0000-0000-000000000000
    coordinate_system:
      identifier: ST_WGS84_UTM37N_P32637
      uuid: ad214d85-dac7-19da-e053-c918a4889309
    stratigraphic_column:
      identifier: DROGON_HAS_NO_STRATCOLUMN
      uuid: 00000000-0000-0000-0000-000000000000
access:
  asset: MyAsset
  sumo:
    enabled: true
global_variables_path: non_standard_folder/global_variables.yml # for non-standard locations

Current _masterdata.yml: https://github.com/equinor/fmu-drogon/blob/master/fmuconfig/input/_masterdata.yml

Detection of global_variables.yml
A frequent problem is that local installations place global_variables.yml in non-standard place and/or with no-standard names. The .fmu could be a no-discussions location to refer to the location of the global_variables.yml. This would, as one example, be relevant for this issue in fmu-dataio.

Detection of Sumo
During development of Sumo, there may be a need to programmatically find out if a particular model revision belongs to an asset onboarded on Sumo. This may replace the use of the --sumo optional argument in FMU data, which is also relevant for this issue in fmu-dataio.

File format
User should ideally "never" access the dotfile (hence it is a dotfile/hidden file). We start with using YAML format for this file, but without guarantees that this may change in the future. Related to this, it is probably smart to have some dedicated tooling on top of this configuration item, so that clients can e.g.

from fmu.config import StaticConfig # names to be discussed
static = StaticConfig()
print(static.validate())
>>> OK! 👍

print(static.global_variables_path)
>>> non_standard_folder/global_variables.yml

In the bigger picture, this also takes an important step towards slimming down global_variables.yml - we take something out, and put it into a brutally standardized location. Going forward, there can be more discussions on which elements that fits where.

@perolavsvendsen
Copy link
Member Author

Suggestion for a user story for a minimum viable product here:
"As a consumer of global_variables.yml I would like to know where the file is located, and avoid having to hardcode the path in arguments, so that I can hide implementations from end users."

(There has been/are discussions around e.g. defining a standard ERT config variable for this, etc, but this is a good use case for the .fmu solution, I think.)

(It is also tempting to solve this by simply requiring standard location and contents of global_variables.yml but that train has probably left long time ago, and untangling this is most likely not doable.)

A minimum viable product should also ship with some basic tooling, and an idea of how to add contents to .fmu - preferably through issues -> PR in the tooling.

@mferrera
Copy link
Collaborator

We should consider whether some of this information can be inferred from the project set-up, pulled from the SMDA API (i.e. uuid's), and pre-populated automatically. This might be terribly error prone -- but so is relying on user input.

@perolavsvendsen
Copy link
Member Author

We should consider whether some of this information can be inferred from the project set-up, pulled from the SMDA API (i.e. uuid's), and pre-populated automatically. This might be terribly error prone -- but so is relying on user input.

Handling master data is an increasing problem. We don't have good solutions for it currently. That said, the amount of master data currently referenced in global_variables.yml is minimal. But it does not, so far, cover Wells - for which we have plenty of pending user stories. There may also be master data definitions coming for e.g. segments, which would be relevant in an FMU context. So, the solution we have is probably required to scale, but I don't think it does. Moving the contents to .fmu is not likely to solve that problem fully.

Technically, we have reduced it down to bare minimums (we only include uuid (+ an identifier for readability)) and then use that for further queries into e.g. SMDA. Some kind of reference is still needed, and it is difficult to see how that is not the UUID - and it is difficult to see how it can be less than the UUID.

Some options, specifically on the master data:

  • We ask SMDA to create ready records for each model setup.
    I don't think this is sustainable, and would advice against it. It would require dealing with SMDA every time someone establishes a new model setup. With the urge to simplify, this would take us in the wrong direction in the opinion of many. It would also add more interfaces, increase risk of failures, etc. Talking to SMDA at the rate of which we currently talk to global_variables.yml is not an option, so we would have to cache the information. So we would need mechanisms for that. I don't see an easy path here.

  • We create our own records, which connects to SMDA, but is owned and operated by us.
    It would be possible to put this into a common structure (e.g one big JSON) and host it e.g. together with the FMU results schemas. This can be done with fmu-dataio, which also builds the schema server today. Infrastructure for that is set up and ready. fmu-dataio could then, during runtime, talk to that server rather than looking up in global_variables.yml. (This would only work for the super-static stuff, such as country and field. For the individual horizon names, which is synched to the stratigraphic column, this would still have to go in global_variables.yml:masterdata or similar. That might be OK.)

The upside of doing this, would be that we centrally define and maintain this on behalf of everyone. The downside is the same - it creates a single point of failure. We could possibly cache this in an .fmu file and read from there if endpoint is not reached, etc. It would not be possible/wise to do this with global_variables.yml given its non-standardized nature.

One challenge would be that in order to query such a centralized record, one would have to at least have the asset and the model.name attributes. That is the absolute minimum to get uniqueness. Several fields have more than one model setup, with different master data references.

It should also be mentioned that this setup is something one person does once per model setup. It is part of the very basic setup of a brand new FMU setup, (or retrofitting to existing setups). So it is not something that by any means will have major impact on anyone's work day. It takes 10-ish minutes to do. Instead of creating elaborate solutions for it, it might be more rational to just do it for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants