Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs update] Examples to be included #135

Open
5 of 24 tasks
matamadio opened this issue Jul 10, 2023 · 52 comments
Open
5 of 24 tasks

[Docs update] Examples to be included #135

matamadio opened this issue Jul 10, 2023 · 52 comments
Assignees
Labels
Docs This issue relates to documentation

Comments

@matamadio
Copy link
Contributor

matamadio commented Jul 10, 2023

List of examples to be produced and included in Docs

Please add any subject that requires an example (figure, table, other) to be explained properly in the docs.

Aims:

  1. Have examples ready to demonstrate the range of capabilities in the RDLS while promoting uptake
  2. Provide illustration and downloadable template / JSON example for more complex cases
  3. Use more simple / constrained examples than fewer complex examples that show multiple concepts.

Hazard

  • Deterministic layers examples (maps) to show documentation of index values
  • Empirical scenario footprint to show use of GLIDE number and event dates
  • Set of hazard maps, to show one of the most common use cases
  • Example using Event_set > events > footprint cascade to show a core capability on footprint uncertainty or multiple types of intensity footprint (e.g. EQ event with SA, pga, pgd)
  • Demonstrating specification of trigger events to show how to code this core capability
  • Stochastic event set Oasis hazard files example - see [Docs update] How to describe Oasis LMF hazard files in RDL #44 and OpenQuake example (SFRARR data) to show how tabulated event data, rather than maps, can be stored
  • Set with current and future climate projected hazard data to show how temporal objects are used
  • Describe historical event set - see [Proposal] Resource file for clustering/seasonality #81 (comment)

Exposure - examples to show multiple data types

  • Building aggregated data and footprint data - to show examples for this data type
  • Crop data - to show example for this data type
  • Infrastructure network data - to show example for this data type
  • Population data - to show example for this data type
  • Set with current and future climate projected hazard data to show how temporal objects are used to describe exposure projections

Vulnerability

  • Vulnerability curves - to show the use of this data type
  • Fatality / mortality curves - to show the use of this data type
  • Fragility curves - to show the use of this data type
  • Socioeconomic vulnerability Indexes - to show the use of this data type

Loss

  • Loss dataset linking to E/H/V data used to show how to add 'full linked datasets' - to demo core capability using dataset IDs
  • Probabilistic Monetary Losses - show maps and tables - to show a core type of analytical output data
  • Probabilistic Non-monetary loss maps and tables to show capability on non-monetary damages
  • Probabilistic Event Loss Table / Year Loss Table outputs
  • Results of an exposure analysis to show outputs in terms of 'count' rather than loss
  • Scenario / empirical Monetary Losses - show maps and tables - to show a core type of analytical output data
  • Scenario / empirical Non-monetary loss maps and tables to show capability on non-monetary damages
@matamadio
Copy link
Contributor Author

matamadio commented Jul 12, 2023

I can start producing the example data maps.
I'll propose a layout to maintain throughout the docs. It would be similar to that used for CCDR docs, please let me know if ok or suggest edits.

@matamadio
Copy link
Contributor Author

matamadio commented Aug 1, 2023

The aim is to have:

  • complete metadata json/sheets examples for each component using all available fields (this could be based on real datasets, or a mockup)
  • specific metadata fields extract for a particular dataset displayed - real metadata from the data we have.

Hazard examples

  • Deterministic layers examples (maps) to show documentation of index values

    Figure Metadata
    Title: Global landslide susceptibility layer
    Description: Deterministic map of mean landslide hazard occurrence frequency.
    Spatial extent: Global
    Risk Data type: Hazard
    Hazard type: Landslide
    Source model: LHASA
    Analysis type: Deterministic
    Calculation method: Inferred
    Intensity measure: Index
    Index criteria: Combination of climatology and observed empirical events.
    License: Open (CC-BY)
  • Describe historical event set - see [Proposal] Resource file for clustering/seasonality #81 (comment)
    Empirical scenario footprint to show use of GLIDE number and event dates
    (combined)

    Figure Metadata
    Title: Satellite detected water extent
    Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC.
    Spatial extent: Somalia; Ethiopia
    Risk Data type: Hazard
    Hazard type: Flood
    Source model: ESA
    Analysis type: Empirical
    Calculation method: Inferred
    Reference period: 2023-4-9 (start); 2023-4-14 (end)
    GLIDE number: FL20230327SOM
    License: Open (CC-BY)
  • Set of hazard maps, to show one of the most common use cases (spreadsheet example)

    Figure Metadata
    Title: Global flood hazard layer
    Description: Probabilistic maps of flood hazard occurrence frequency by return period.
    Spatial extent: Global
    Risk Data type: Hazard
    Hazard type: Flood
    Hazard processes: Fluvial flood; Pluvial flood
    Source model: FATHOM
    Analysis type: Probabilistic
    Occurrence range: once in 10 to 1,000 years
    Calculation method: Simulated
    Intensity measure: Water depth [m]
    License: Commercial
  • Set with current and future climate projected hazard data to show how temporal objects are used

    Figure Metadata
    Title: Aqueduct flood hazard maps
    Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period.
    Spatial extent: Global
    Risk Data type: Hazard
    Hazard type: Coastal flood
    Hazard processes: Storm surge
    Source model: Aqueduct
    Period(s): 2015, 2030, 2050, 2080
    Analysis type: Probabilistic
    Occurrence range: once in 5 to 1,000 years
    Calculation method: Simulated
    Intensity measure: Water depth [m]
    License: Open (CCY-BY)
  • Example using Event_set > events > footprint cascade to show a core capability on footprint uncertainty or multiple types of intensity footprint (e.g. EQ event with SA, pga, pgd)

    Figure Metadata
    3 earthquake layers with different IMT Risk Data category: Hazard
    Source model: ...
    ...
  • Demonstrating specification of trigger events to show how to code this core capability

    Figure Metadata
    TBD Risk Data category: Hazard
    Source model: ...
    ...
  • Stochastic event set Oasis hazard files example - see [Docs update] How to describe Oasis LMF hazard files in RDL #44 and OpenQuake example (SFRARR data) to show how tabulated event data, rather than maps, can be stored

    Figure Metadata
    Table data Risk Data category: Hazard
    Source model: Fathom
    ...

@matamadio
Copy link
Contributor Author

matamadio commented Aug 1, 2023

Exposure examples

Figure example for each data type (random locations):

  • Building aggregated data and footprint data

    Figure Metadata
    WSF/GHS builtup Risk Data category: Exposure
    Source model: ...
    ...
    Figure Metadata
    OSM footprint Risk Data category: Exposure
    Source model: ...
    ...
  • Land cover data

    Figure Metadata
    Title: WorldCover
    Description: Global land cover map
    Spatial extent: Global
    Spatial resolution: 10 m
    Risk Data type: Exposure
    Exposure category: Buildings; Natural environment
    Source model: ESA
    Reference period: 2020
    License: Open (CCY-BY)
  • Infrastructure network data

    Figure Metadata
    Central Asia road network exposure Risk Data category: Exposure
    Source model: ...
    ...
  • Population data

    Figure Metadata
    Title: Global Human Settlment Layer
    Description: Global population density map from remote sensing interpretation
    Spatial extent: Global
    Spatial resolution: 100 m
    Risk Data type: Exposure
    Exposure category: Population
    Source model: JRC
    Reference period: 2020
    License: Open (CCY-BY)
  • Set with current and future projected data to show how temporal objects are used to describe exposure projections

    Figure Metadata
    Central Asia residential exposure - current future scenarios Title: Central Asia residential building exposure
    Description: Simulated residential exposure distribution and replacement costs for Central Asia region
    Spatial extent: Global
    Spatial resolution: 500 m
    Risk Data type: Exposure
    Exposure category: Buildings
    Source model: RED/OGS
    Reference period: 2020 and 2080
    License: Open (CCY-BY-4.0)

see spreadsheet and json metadata for Central Asia residential exposure - current future scenarios

@matamadio
Copy link
Contributor Author

matamadio commented Aug 1, 2023

Vulnerability examples

  • Vulnerability curves examples

    Figure Metadata
    TBD Risk Data category: Vulnerability
    Source model: ...
    ...
  • Fatality / mortality curves

    Figure Metadata
    Risk Data category: Vulnerability
    Source model: ...
    ...
  • Fragility curves / damage functions

    Figure Metadata

    Title: Global Flood depth-damage functions
    Description: Flood impact functions over land cover categories
    Spatial extent: Global
    Risk Data type: Vulnerability
    Primary hazard: Flood
    Source model: JRC
    Reference period: 2015
    License: Open (CCY-BY)
    Details:A globally-consistent database of depth-damage curves depicting fractional damage function of water depth as well as maximum damage values for a variety of assets and land use classes. Based on an extensive literature survey concave damage curves have been developed for each continent, while differentiation in flood damage between countries is established by determining maximum damage values at the country scale.
  • Socioeconomic vulnerability Indexes

    Figure Metadata
    TBD Risk Data category: Vulnerability
    Source model: ...
    ...

@matamadio
Copy link
Contributor Author

matamadio commented Aug 1, 2023

Loss examples

Note: Use Central Asia SFRARR project / Africa R5 as examples

  • Loss dataset linking to E/H/V data used to show how to add 'full linked datasets' - to demo core capability using dataset IDs

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...
  • Probabilistic Monetary Losses - show maps and tables - to show a core type of analytical output data

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...
  • Probabilistic Non-monetary loss maps and tables to show capability on non-monetary damages

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...
  • Probabilistic Event Loss Table / Year Loss Table outputs

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...
  • Results of an exposure analysis to show outputs in terms of 'count' rather than loss

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...
  • Scenario / empirical Monetary Losses - show maps and tables - to show a core type of analytical output data

    Figure Metadata
    Risk Data category: Loss
    Source model: ...
    ...
  • Scenario / empirical Non-monetary loss maps and tables to show capability on non-monetary damages

    Figure Metadata
    TBD Risk Data category: Loss
    Source model: ...
    ...

@matamadio
Copy link
Contributor Author

Should we attach a download link for each of the datasets shown in the example? E.g. OSM data for the city shown, hazard layer, etc.
Should the file be hosted in github in some /downloads/ folder?

@duncandewhurst
Copy link
Contributor

My understanding is that the purpose of the examples is to help readers to understand how RDLS metadata can be used to describe different aspects of risk datasets. I think that we should aim for the text and screenshots for each example to provide sufficient information about the relevant aspects of the datasets. Otherwise, it would be a lot of extra work for readers to download each example and open it in an appropriate software package.

@matamadio
Copy link
Contributor Author

@odscjen is it ok to provide examples as this (markdown-html), or should it be turned into json?

@odscjen
Copy link
Contributor

odscjen commented Aug 8, 2023

Ultimately we'll want to provide them in both markdown-html AND in JSON. For now markdown is fine and once the spreadsheet template and CoVE are up and running we can convert them into JSON as well.

@odscjen
Copy link
Contributor

odscjen commented Aug 8, 2023

@matamadio an important thing when creating these examples is to ensure you're using the field titles and codelist values (can use the labels rather than the codes for ease of readin) from the schema and included all of the required fields. Looking at the Hazard examples you've gotten so far there's a few errors:

Deterministic layers examples (maps) to show documentation of index values

Figure Metadata
Title: Global landslide susceptibility layer
Description: Deterministic map of mean landslide hazard occurrence frequency.
Spatial scale: global
Risk Data type: Hazard
Hazard type: Landslide
Source name: LHASA
Source type: model
Analysis type: Deterministic
Frequency distribution: Susceptibility
Calculation method: Inferred
Deterministic frequency intensity measure: Index
Index criteria: Combination of climatology and observed empirical events.
License: Open (CC-BY)

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Susceptibility would translate to?) Unless this should actually be a different field?

Describe historical event set - see #81 (comment)
Empirical scenario footprint to show use of GLIDE number and event dates

Figure Metadata
Title: Satellite detected water extent
Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC.
Countriest: Somalia; Ethiopia
Risk Data type: Hazard
Hazard type: Flood
Source name: ESA
Source type: model
Analysis type: Empirical
Calculation method: Inferred
Temporal: 2023-09-04 (start); 2023-04-14 (end)
Disaster identifier: FL20230327SOM
License: Open (CC-BY)

Dates should be in YYY-MM-DD format.

Set of hazard maps, to show one of the most common use cases

Figure Metadata
Title: Global flood hazard layer
Description: Probabilistic maps of flood hazard occurrence frequency by return period.
Spatial scale: Global
Risk Data type: Hazard
Hazard type: Flood
Hazard processes: Fluvial flood; Pluvial flood
Source name: FATHOM
Source type: model
Analysis type: Probabilistic
Frequency distribution: Return periods
Occurrence range: once in 10 to 1,000 years
Calculation method: Simulated
Intensity measure: Flood water depth [m]
License: Commercial

'River flood' isn't in the process_type codelist, this should be 'fluvial_flood' as this codelist is closed.

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Return periods would translate to?) Suspect this should actually be a different field?

Set with current and future climate projected hazard data to show how temporal objects are used

Figure Metadata
Title: Aqueduct flood hazard maps
Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period.
Spatial scale: Global
Risk Data type: Hazard
Hazard type: Coastal flood
Hazard processes: Storm surge
Source name: Aqueduct
Source type: model
Temporal: 2015, 2030, 2050, 2080
Analysis type: Probabilistic
Frequency distribution: Return periods
Occurrence range: once in 5 to 1,000 years
Calculation method: Simulated
Intensity measure: Flood water depth [m]
License: Open (CCY-BY)

For all the examples where analysis_type = 'Probabilistic' occurrence.probabilistic.probability.span is a required field if you're including any event level data.

@matamadio
Copy link
Contributor Author

Thanks Jen, fixed examples but missing the last comment: still unsure on how I should indicate occurrence probability in the most common case (return period scenarios 1/n).

This is the case of the flood models where Analysis type: Probabilistic
E.g. the fathom dataset example: we have 3 layers in the dataset: 1/n1, 1/n2, 1/n3. The probabilistic range is 1/n1 to 1/n3, and there is no specific period span to specify.

@odscjen
Copy link
Contributor

odscjen commented Aug 10, 2023

Sorry that final comment I had misread the schema! span is only required if you're using event.occurrence.probabilistic.probability

I think there are 2 options here:

  1. you just use occurrence_range which sits in event_set to list all 3 probabilities. The description of this field makes it clear that it's only for probabilistic values so that should be clear to the users what the values given are.
  2. each of the 3 values relates to a separate event within the event_set and you put the values in return_period which sits in event.occurrence.probabilistic and you don't use .probability at all.

@matamadio
Copy link
Contributor Author

matamadio commented Aug 10, 2023

For the sake of quick example, I would pick option 1.

@duncandewhurst
Copy link
Contributor

From today's check-in call with @matamadio and @odscrachel, we agreed that @matamadio will prepare examples using the spreadsheet template using only the relevant fields (i.e. not full RDLS metadata files). We can then convert those into JSON format to store in the repository which should give us the flexibility to present them in the documentation as needed (e.g. using field titles rather than JSON paths).

@matamadio
Copy link
Contributor Author

Spreadsheet example about Fathon global dataset.

Figure Metadata
Title: Global flood hazard layer
Description: Probabilistic maps of flood hazard occurrence frequency by return period.
Spatial extent: Global
Risk Data type: Hazard
Hazard type: Flood
Hazard processes: Fluvial flood; Pluvial flood
Source model: FATHOM
Analysis type: Probabilistic
Occurrence range: once in 10 to 1,000 years
Calculation method: Simulated
Intensity measure: Water depth [m]
License: Commercial

@matamadio matamadio pinned this issue Aug 17, 2023
@matamadio
Copy link
Contributor Author

About the example panel:

  • would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?

    Figure Metadata Json schema
    Title: Global flood hazard layer
    Description: Probabilistic maps of flood hazard occurrence frequency by return period.
    Spatial extent: Global
    Risk Data type: Hazard
    Hazard type: Flood
    Hazard processes: Fluvial flood; Pluvial flood
    Source model: FATHOM
    Analysis type: Probabilistic
    Occurrence range: once in 10 to 1,000 years
    Calculation method: Simulated
    Intensity measure: Water depth [m]
    License: Commercial
    Corresponding json

@duncandewhurst
Copy link
Contributor

About the example panel:

* would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?

Yep. Given the length of some of the field values, I think it's best to show each in a separate tab. I've tested this out by adding the Fathom hazard example in #196.

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

In particular, it would be good to get your feedback on:

  1. Whether to present the tabular format as separate tables.
  2. Whether to include identifiers in the tabular example.

The advantages of using separate tables and including identifiers are:

  • the tabular examples better reflect the structure of the schema and spreadsheet
  • the relationship between the tabular example and the JSON example is clearer
  • it reduces ambiguity if fields belonging to different objects have the same title

The downside is that it makes the tabular example longer than presenting all the values in the same table and without identifiers.

If you're happy with the general approach, then I think the best workflow is for you to do the initial preparation of the examples using the spreadsheet template, we can then convert them to JSON to add to the standard repository and the pre-commit script will handle creating the human-friendly CSVs for display in the documentation. For ongoing maintenance, it will be easiest to edit the JSON files directly.

@matamadio
Copy link
Contributor Author

matamadio commented Aug 18, 2023

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I'll produce additional examples to add in the gdrive folder, nametag _docsample

@matamadio
Copy link
Contributor Author

matamadio commented Aug 18, 2023

See example for exposure: built-up surface (GHS): rdls_exp-GHS_docsample.xlsx

Figure:

immagine

Note 1: different from the real example provided about Thailand, this one indicates the whole global dataset and not a derived national subset. Also attribution is different.
Note 2: needs exposure metric specification, see #194.
Note 3: there are 2 references for the same resource

@matamadio
Copy link
Contributor Author

matamadio commented Aug 18, 2023

Example for Vulnerability: rdls_vln-FL_JRC

  • Fragility curves / damage functions

Can be used either for docs snippet and as full example.

Figure (one of many possible):

@odscjen
Copy link
Contributor

odscjen commented Aug 21, 2023

rdls_vln-FL_JRC.xlsx

  • contact_point.name and creator.name missing so used Mattia's name for contract_point and the publisher.name for creator
  • spatial was missing, used .scale = 'global'
  • missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.

@matamadio
Copy link
Contributor Author

matamadio commented Aug 21, 2023

Thanks for the feedback, sorry for the missing/wrong input!

  • Ok for using author as creator
  • Ok for using my data as contact point
  • Ok for removing global boundary boxes
  • Ok for full date in referenced_by
  • Ok for other missing details unless explained below
  • Let's hold on the exposure example until finalizing the [Schema] Exposure costs and metrics #194
  • Docsample are not necessarily meant to include resource download (not needed imho for docs schema examples)

rdls_exp-GHS-THA.xlsx: The resource.url links to a page where the default download is Download the global GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0 dataset in a single file which seems to be for 2023 not 2020 as given in resources.temporal. I couldn't figure out how to get that to change to 2020. @matamadio can you make it select 2020 or if not we can just change resources.temporal in the example to be 2023.

URL for this example to be replaced with specific resource data (zip to be hosted in GH docs/_datasamples or similar).
The full dataset includes a range of years; this specific subset is for year 2020, for Thailand extent. I could also publish on DDH, but not immediatly (need to wait project completion).

rdls_hzd-AQD.xlsx: event_set id = "2" has no hazards but this is required in the schema. I think what's happen is some confusion with the identifiers in the spreadsheet. In 'hazard_event_sets_hazards' there are 2 hazard objects both linked to event_set 1. But the events in event_set 1 only match the first of these event_set.hazards. BUT the hazards in 'hazard_event_sets_events' for 'event_sets/0/id' 2 don't match the second of the event_set.hazards, with the difference in the hazard.type, in 'hazard_event_sets_hazards' for the second hazard the .type = "flood" but in 'hazard_event_sets_events' the .type = "coastal_flood". @matamadio is the second of the event_set hazards supposed to be linked to the second event_set?

Commenting in the excel file

gazetteerEntries.id should be the actual code from the scheme, so in this case it should be 'TH' as this is the ISO 31-66-2 code for Thailand. So I've moved this from .description and replaced .description with "Thailand'.

Thanks, this needs to be explained in description. Please note this (and other country examples) uses ISO3166-1-alpha2: first level unit (country), 2 letters code.

resource.url is missing. This has been discussed previously (GFDRR/rdls-spreadsheet-template#3 (comment)) so to make the validation pass I've added some dummy url's as this is a commercial product so it's not going to be possible to provide a proper url to the actual data.

Else the url could point to the exising datacatalog page (from where resource can be requested).

events in event_set 1 are missing hazard.type and hazard_process so I've just copied them in from the event_set.hazard values. And done for the same for the other 2 event_sets and given them all local ids
no license so I just put in 'commercial' so that it'll pass validation (and this is essentially correct)

Sorry - they are all hazard type: flood; 1 and 2 process is fluvial flood, while 3 is pluvial flood.

missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.

I would put taxonomy as optional here. Originally these were based on Corine Land Cover classes (CLC), but in the end they use their own general taxonomy for splitting curve types. So "internal" is ok.

@duncandewhurst
Copy link
Contributor

rdls_exp-GHS_docsample.xlsx

* @duncandewhurst I think there must be a mistake in the template as `links.rel` is prepopulating with 'describedby' and not 'describedBy'

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

@odscjen
Copy link
Contributor

odscjen commented Aug 22, 2023

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

@odscjen
Copy link
Contributor

odscjen commented Aug 22, 2023

Else the url could point to the exising datacatalog page (from where resource can be requested).

this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

@duncandewhurst
Copy link
Contributor

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

Please can you share the data and command(s) that you're using in a new issue? I converted and tested rdls_hzd-AQD.xlsx using the commands in GFDRR/rdls-spreadsheet-template#4 and there were no validation errors.

@odscjen
Copy link
Contributor

odscjen commented Aug 24, 2023

@duncandewhurst I used the flatten tool command from that issue but I was using https://www.jsonschemavalidator.net/ for the validation. The schema is definitely the current dev branch schema but I get the following error message

Message:
String 'describedby' does not match regex pattern '^(?!(describedby))'.
Schema path:
https://raw.githubusercontent.com/GFDRR/rdl-standard/0__2__0/schema/rdls_schema.json#/properties/links/items/properties/rel/pattern

@duncandewhurst
Copy link
Contributor

Ah, so as I mentioned in the issue description:

You can also ignore the error relating to the regex pattern for links.rel. I think that's a false positive due to that validator only supporting JSON Schema draft 2019-09 so it should be resolved in CoVE, which uses draft 2020-12.

As expected, there are no errors when validating against draft 2020-12 using check-jsonschema.

@stufraser1
Copy link
Member

stufraser1 commented Aug 28, 2023

Else the url could point to the exising datacatalog page (from where resource can be requested).
...
this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

We need to make sure, when linking to the datacatalog, we are NOT using https://datacatalog.worldbank.org/int/search/..., which is internal only (and the default when Mat, Pierre, I copy a link, but make sure to remove the 'int/' to make it visible externally: https://datacatalog.worldbank.org/search/...

@stufraser1
Copy link
Member

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I agree, following the example for hazard, rather than for exposure looks much better. Easy to tab between each representation of the example, and very clear where to find the examples.

@odscjen
Copy link
Contributor

odscjen commented Aug 29, 2023

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

@stufraser1
Copy link
Member

stufraser1 commented Aug 29, 2023

Set of hazard maps, to show one of the most common use cases

I also created, as a test, a sheet containing 6 zipped resources containing flood hazard map geotiffs.
I created a single event set (its a regional analysis), 6 resources, and one event per country per return period (50 events) and one footprint per event. This differs from the Fathom data example, which has one event per hazard type (3, PLU, FLU Def, FLU Undef) and no footprints. The necessary information gets across to the user either way but I'm not sure which is better.
I created it this way because that is how we've packaged it in the dataset on DDH but this is not necessarily the best way, please feel free to suggest a better way - though we're unlikely to reconfigure the dataset on DDH now.

sheet
json

@duncandewhurst
Copy link
Contributor

With the exception of RDLS_full_SFRARR_fluvialhazardmaps.json, I've added all of the examples in the JSON conversions folder to the schema reference documentation in #196. I'm sharing a summary of key changes and design decisions below:

I updated the JSON files to reflect the latest version of the schema, but I haven't updated the spreadsheets that were used to generate them. I also corrected one semantic error in spatial.gazetteerEntries in the Central Asia exposure examples, see the commit for details: 0273914. I also put the two Central Asia exposure dataset examples in separate JSON files for ease of comprehension.

To reduce the length of the schema reference page, I've nested the examples with collapsible drop-downs.

image

Where there is more than one example for a component, only the first example is uncollapsed. If there is no figure for an example, it is collapsed. I couldn't find a suitable figure for the Central Asia exposure examples, but I took a screenshot from global flood depth-damage functions PDF to use as a figure for that example:

image

The row titles in the tabular examples now include the titles of intermediary objects so that it is possible to distinguish between, for example, publisher name and creator name (previously they were both titled 'name'):

image

To reduce the amount of screen space taken up by the JSON examples, they are now collapsible, with objects and arrays collapsed by default:

image

@matamadio
Copy link
Contributor Author

matamadio commented Aug 30, 2023

Very nice, thanks.
Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

  • Using word wrap should fix most cases (description, details)
  • URLS as id might be truncated..?

@matamadio
Copy link
Contributor Author

matamadio commented Aug 30, 2023

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.
The one for loss is still to be produced.

@duncandewhurst
Copy link
Contributor

Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

Addressed in #214.

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.

Added in #196.

I don't think there's anything else to do for this issue until the loss example is ready. Let me know if that's wrong!

@duncandewhurst
Copy link
Contributor

@matamadio and @stufraser1 to discuss and prepare loss examples.

@matamadio
Copy link
Contributor Author

matamadio commented Sep 5, 2023

One example of loss data (results of the analysis) from CCDR:

Download THA_RSK.xlsx

This represents one specific country, but the same template applies to any country I've been working on.
The dataset consists of one excel file, made of several tabs:

  • Classes | Legend: how to interpret the data within the file
  • Overview: key results summary with charts
  • ADM(i)_summary: exposure or impact values for all hazard summed up at the ADM level
  • Individual hazard scores (EAE and EAI) and the calculations behind those, for the smallest adm level
  • EM-DAT: disaster list for context

The tabular data for the ADM scores is also provided as geospatial (gpkg). It does not have an explicit loss curve chart, but has all the elements to build it.

@stufraser1 should it fit in the schema in the current state, or do you have any suggestion for better formatting? This is key as Im just now setting the default for the new year analytics.

@stufraser1
Copy link
Member

I would say there are sheets in there that wouldn't normally go into the loss component:

  • classes may be more appropriate to include in the Vulnerability component
  • EM-DAT and LS_Event_records are historicla catalogues and should be separate.

My preference for describing these files in RDL Loss would be to include this as a dataset, and give each sheet as its own resource (.csv), rather than an xlsx book so users can see the list of resource descriptions per dataset, rather than them navigating in many sheets, but I see it could be described in metadata using the existing structure with the workbook as a single resource.

@matamadio
Copy link
Contributor Author

matamadio commented Sep 6, 2023

I have some questions about the loss schema.
See simplified CCDR output example in the Gdrive folder.

THA_CCDR_RSK_ADM1.xlsx describes loss output for 2 hazards (river floods and coastal floods) over 2 exp categories. The complete standard ouput would include 5-6 hazards and 3 exposed categories.

Metadata spreadsheet has loss attributes at the dataset level, so I have to create 4 dataset rows.

immagine

But all these information are actually in just one file.

immagine

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

@stufraser1
Copy link
Member

Good catch.
We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object.
I don't think anything else would need nesting: one level should suffice.
@odscrachel please could you advise if we can process this quickly / overnight with @duncandewhurst ?

@stufraser1
Copy link
Member

stufraser1 commented Sep 6, 2023

I also have a couple of issues testing with a return period dataset:

  • loss/impact/unit does not include impact_unit code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank
  • loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?
  • spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.
  • There is a mismatch in loss/cost/0/dimension and loss/cost/0/unit - dimension includes population but the unit requires a currency code.
  • sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs

Here is the loss metadata file for use in the loss example:
json
xlsx
image: tabulated data, so no image provided

@stufraser1
Copy link
Member

images for exposure examples:
Central Asia residential current
Central Asia residential projected

This was referenced Sep 6, 2023
@duncandewhurst
Copy link
Contributor

duncandewhurst commented Sep 7, 2023

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

Each row in the datasets sheet represents a dataset so if there are rows with the same id, the JSON output will be single dataset with the values from the final row, i.e. the values from the earlier rows will be overwritten. Therefore, the Thailand CCR example does point to the need for an array of losses.

I've drafted a PR for the changes proposed in #135 (comment) and #135 (comment):

  • Make loss an array
  • Make loss.cost an object

@stufraser1 @matamadio I will leave it up to you to decide if you want to merge this PR for inclusion in the 0.2 release or leave it for later. My sense is that modelling for loss metadata warrants further exploration (I'll open an issue), but that the changes in the PR are an improvement over the current model so I would merge it.

I'll hold off preparing a PR to add the loss examples until we have decided what to do about the schema as if the schema changes the examples will need to be updated. I've also left some comments on the SFRARR example spreadsheet where I think some fields may have been populated incorrectly.


@stufraser1 I've shared my feedback on your other questions and suggestions below.

I also have a couple of issues testing with a return period dataset:

  • loss/impact/unit does not include impact_unit code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank

This was discussed at some length in #75, but the conversation in that issue took a different direction so I don't think it was fully resolved.

My preferred approach is not to worry about units and instead to model the kind of quantity being measured (currency, in this case) since users can convert between units of the same quantity kind. That is the approach we settled on for exposure metrics and I think it would make sense to have consistent modelling for exposure metrics and impact metrics. However, that is quite a significant change to consider at this stage for 0.2.

The alternative solution that I proposed was to add an Impact.currency field for monetary losses. The reasons for separating unit and currency are twofold:

  1. Completeness: The complete list of currencies is well-defined and all currencies are of more-or-less equal relevance to RDLS so it makes sense to have a comprehensive (closed) currency codelist. Whereas the complete list of non-currency units is less well defined and many non-currency units are totally irrelevant to RDLS so it makes sense to have a representative (open) codelist of the most relevant units.
  2. Usability: There are very many currencies so it is much harder for a publisher to see which non-currency units are available if they are mixed in with the long list of currencies.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.

So the options are:

  1. Do nothing
  2. Add Impact.currency
  3. Try to align the modelling of impact metrics with the modelling of exposure metrics

If needed, we can do option 2 for the 0.2 release and work on option 3 for the next release. Let me know what you want to do.

  • loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?

It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.

I think that this warrants further investigation, but I don't think we'll resolve it in time for 0.2.

  • spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.

Regarding the spreadsheet template, I can see a link in the template and in the rdls_template_loss_SFRARR_eqrisk.xlsx (see below). Where is it missing from?

image

Good catch on the broken link in the documentation, this was because some codelists links in the schema included .html, which was working in the schema browser, but not in the schema reference tables, for some reason. I've fixed them in #244.

  • There is a mismatch in loss/cost/0/dimension and loss/cost/0/unit - dimension includes population but the unit requires a currency code.

This is because Cost is intended only to be used for monetary costs, but the codelist for Cost.dimension is shared with Metric.dimension.

  • sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs

In rdls_template_loss_SFRARR_eqrisk.xlsx, it looks like you might've copy-pasted the value from the id column into the sources/0/id column, which has also copied the data validation rules. That's how copy-pasting behaves in Google Sheets and Excel unless you paste values only (Ctrl+Shift+V). Looking at the blank template in the spreadsheet template repository, there are no validation rules on the sources/0/id column.

Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided

@matamadio
Copy link
Contributor Author

images for exposure examples: Central Asia residential current Central Asia residential projected

Looks like the symbology represents ADM codes; it would be great to show attribute of exposure value with legend. If you can hook me to the dataset, I can produce those maps quickly.

We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object.

Agree on avoiding unnecessary nesting, it should be just one loss object/tab.

I've drafted a PR for the changes proposed in #135 (comment) and #135 (comment):

* Make `loss` an array
* Make `loss.cost` an object

I approved the change, as it already improves the usability. Would implement in 0.2, and wait next release for other refinements.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.

So the options are:

1. Do nothing
2. Add Impact.currency
3. Try to align the modelling of impact metrics with the modelling of exposure metrics

I'd say 2; most intuitively, as an optional field if impact.unit = monetary (adding it to impact_unit codelist if this doesn't break the QUDT standard). Nice to have it in 0.2 already if quickfix.

loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?

It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.

I would:

  • remove base_data_type (which data is referring anyway? Hazard? Exposure? There is also loss/hazard_analysis_type for that) and keep loss/approach.
  • remove the word "regression" from the codelist definition. Keep it broad to apply more generally.

@duncandewhurst
Copy link
Contributor

I've merged the loss updates PR so that we can release 0.2. The examples will need updating to reflect the schema changes and adding to the schema reference page. That can be done without needing to make another release because the examples themselves aren't normative.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.
So the options are:

1. Do nothing
2. Add Impact.currency
3. Try to align the modelling of impact metrics with the modelling of exposure metrics

I'd say 2; most intuitively, as an optional field if impact.unit = monetary (adding it to impact_unit codelist if this doesn't break the QUDT standard). Nice to have it in 0.2 already if quickfix.

In the interest of not delaying the 0.2 release further, I've left this for now as it will require further discussion ("monetary" is a quantity kind rather than a unit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs This issue relates to documentation
Projects
None yet
Development

No branches or pull requests

4 participants