[Docs update] Examples to be included #135

matamadio · 2023-07-10T09:51:20Z

List of examples to be produced and included in Docs

Please add any subject that requires an example (figure, table, other) to be explained properly in the docs.

Aims:

Have examples ready to demonstrate the range of capabilities in the RDLS while promoting uptake
Provide illustration and downloadable template / JSON example for more complex cases
Use more simple / constrained examples than fewer complex examples that show multiple concepts.

Hazard

Deterministic layers examples (maps) to show documentation of index values
Empirical scenario footprint to show use of GLIDE number and event dates
Set of hazard maps, to show one of the most common use cases
Example using Event_set > events > footprint cascade to show a core capability on footprint uncertainty or multiple types of intensity footprint (e.g. EQ event with SA, pga, pgd)
Demonstrating specification of trigger events to show how to code this core capability
Stochastic event set Oasis hazard files example - see [Docs update] How to describe Oasis LMF hazard files in RDL #44 and OpenQuake example (SFRARR data) to show how tabulated event data, rather than maps, can be stored
Set with current and future climate projected hazard data to show how temporal objects are used
Describe historical event set - see [Proposal] Resource file for clustering/seasonality #81 (comment)

Exposure - examples to show multiple data types

Building aggregated data and footprint data - to show examples for this data type
Crop data - to show example for this data type
Infrastructure network data - to show example for this data type
Population data - to show example for this data type
Set with current and future climate projected hazard data to show how temporal objects are used to describe exposure projections

Vulnerability

Vulnerability curves - to show the use of this data type
Fatality / mortality curves - to show the use of this data type
Fragility curves - to show the use of this data type
Socioeconomic vulnerability Indexes - to show the use of this data type

Loss

Loss dataset linking to E/H/V data used to show how to add 'full linked datasets' - to demo core capability using dataset IDs
Probabilistic Monetary Losses - show maps and tables - to show a core type of analytical output data
Probabilistic Non-monetary loss maps and tables to show capability on non-monetary damages
Probabilistic Event Loss Table / Year Loss Table outputs
Results of an exposure analysis to show outputs in terms of 'count' rather than loss
Scenario / empirical Monetary Losses - show maps and tables - to show a core type of analytical output data
Scenario / empirical Non-monetary loss maps and tables to show capability on non-monetary damages

matamadio · 2023-07-12T11:21:46Z

I can start producing the example data maps.
I'll propose a layout to maintain throughout the docs. It would be similar to that used for CCDR docs, please let me know if ok or suggest edits.

matamadio · 2023-08-01T11:17:00Z

The aim is to have:

complete metadata json/sheets examples for each component using all available fields (this could be based on real datasets, or a mockup)
specific metadata fields extract for a particular dataset displayed - real metadata from the data we have.

Hazard examples

Deterministic layers examples (maps) to show documentation of index values

Figure	Metadata
	Title: Global landslide susceptibility layer Description: Deterministic map of mean landslide hazard occurrence frequency. Spatial extent: Global Risk Data type: Hazard Hazard type: Landslide Source model: LHASA Analysis type: Deterministic Calculation method: Inferred Intensity measure: Index Index criteria: Combination of climatology and observed empirical events. License: Open (CC-BY)

Describe historical event set - see [Proposal] Resource file for clustering/seasonality #81 (comment)
Empirical scenario footprint to show use of GLIDE number and event dates
(combined)

Figure	Metadata
	Title: Satellite detected water extent Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC. Spatial extent: Somalia; Ethiopia Risk Data type: Hazard Hazard type: Flood Source model: ESA Analysis type: Empirical Calculation method: Inferred Reference period: 2023-4-9 (start); 2023-4-14 (end) GLIDE number: FL20230327SOM License: Open (CC-BY)

Set of hazard maps, to show one of the most common use cases (spreadsheet example)

Figure	Metadata
	Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial

Set with current and future climate projected hazard data to show how temporal objects are used

Figure	Metadata
	Title: Aqueduct flood hazard maps Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Coastal flood Hazard processes: Storm surge Source model: Aqueduct Period(s): 2015, 2030, 2050, 2080 Analysis type: Probabilistic Occurrence range: once in 5 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Open (CCY-BY)

Example using Event_set > events > footprint cascade to show a core capability on footprint uncertainty or multiple types of intensity footprint (e.g. EQ event with SA, pga, pgd)

Figure Metadata

3 earthquake layers with different IMT Risk Data category: Hazard
Source model: ...
...
Demonstrating specification of trigger events to show how to code this core capability

Figure Metadata

TBD Risk Data category: Hazard
Source model: ...
...
Stochastic event set Oasis hazard files example - see [Docs update] How to describe Oasis LMF hazard files in RDL #44 and OpenQuake example (SFRARR data) to show how tabulated event data, rather than maps, can be stored

Figure Metadata

Table data Risk Data category: Hazard
Source model: Fathom
...

matamadio · 2023-08-01T11:34:51Z

Exposure examples

Figure example for each data type (random locations):

Building aggregated data and footprint data

Figure Metadata

WSF/GHS builtup Risk Data category: Exposure
Source model: ...
...

Figure Metadata

OSM footprint Risk Data category: Exposure
Source model: ...
...

Land cover data

Figure	Metadata
	Title: WorldCover Description: Global land cover map Spatial extent: Global Spatial resolution: 10 m Risk Data type: Exposure Exposure category: Buildings; Natural environment Source model: ESA Reference period: 2020 License: Open (CCY-BY)

Infrastructure network data

Figure Metadata

Central Asia road network exposure Risk Data category: Exposure
Source model: ...
...

Population data

Figure	Metadata
	Title: Global Human Settlment Layer Description: Global population density map from remote sensing interpretation Spatial extent: Global Spatial resolution: 100 m Risk Data type: Exposure Exposure category: Population Source model: JRC Reference period: 2020 License: Open (CCY-BY)

Set with current and future projected data to show how temporal objects are used to describe exposure projections

Figure	Metadata
Central Asia residential exposure - current future scenarios	Title: Central Asia residential building exposure Description: Simulated residential exposure distribution and replacement costs for Central Asia region Spatial extent: Global Spatial resolution: 500 m Risk Data type: Exposure Exposure category: Buildings Source model: RED/OGS Reference period: 2020 and 2080 License: Open (CCY-BY-4.0)

see spreadsheet and json metadata for Central Asia residential exposure - current future scenarios

matamadio · 2023-08-01T11:40:55Z

Vulnerability examples

Vulnerability curves examples

Figure Metadata

TBD Risk Data category: Vulnerability
Source model: ...
...
Fatality / mortality curves

Figure Metadata

Risk Data category: Vulnerability
Source model: ...
...

Fragility curves / damage functions

Figure	Metadata
	Title: Global Flood depth-damage functions Description: Flood impact functions over land cover categories Spatial extent: Global Risk Data type: Vulnerability Primary hazard: Flood Source model: JRC Reference period: 2015 License: Open (CCY-BY) Details:A globally-consistent database of depth-damage curves depicting fractional damage function of water depth as well as maximum damage values for a variety of assets and land use classes. Based on an extensive literature survey concave damage curves have been developed for each continent, while differentiation in flood damage between countries is established by determining maximum damage values at the country scale.

Socioeconomic vulnerability Indexes

Figure Metadata

TBD Risk Data category: Vulnerability
Source model: ...
...

matamadio · 2023-08-01T11:45:19Z

Loss examples

Note: Use Central Asia SFRARR project / Africa R5 as examples

Loss dataset linking to E/H/V data used to show how to add 'full linked datasets' - to demo core capability using dataset IDs

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...
Probabilistic Monetary Losses - show maps and tables - to show a core type of analytical output data

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...
Probabilistic Non-monetary loss maps and tables to show capability on non-monetary damages

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...
Probabilistic Event Loss Table / Year Loss Table outputs

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...
Results of an exposure analysis to show outputs in terms of 'count' rather than loss

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...
Scenario / empirical Monetary Losses - show maps and tables - to show a core type of analytical output data

Figure Metadata

Risk Data category: Loss
Source model: ...
...
Scenario / empirical Non-monetary loss maps and tables to show capability on non-monetary damages

Figure Metadata

TBD Risk Data category: Loss
Source model: ...
...

matamadio · 2023-08-03T10:57:59Z

Should we attach a download link for each of the datasets shown in the example? E.g. OSM data for the city shown, hazard layer, etc.
Should the file be hosted in github in some /downloads/ folder?

duncandewhurst · 2023-08-04T00:09:34Z

My understanding is that the purpose of the examples is to help readers to understand how RDLS metadata can be used to describe different aspects of risk datasets. I think that we should aim for the text and screenshots for each example to provide sufficient information about the relevant aspects of the datasets. Otherwise, it would be a lot of extra work for readers to download each example and open it in an appropriate software package.

matamadio · 2023-08-08T08:32:38Z

@odscjen is it ok to provide examples as this (markdown-html), or should it be turned into json?

odscjen · 2023-08-08T10:35:39Z

Ultimately we'll want to provide them in both markdown-html AND in JSON. For now markdown is fine and once the spreadsheet template and CoVE are up and running we can convert them into JSON as well.

odscjen · 2023-08-08T11:19:26Z

@matamadio an important thing when creating these examples is to ensure you're using the field titles and codelist values (can use the labels rather than the codes for ease of readin) from the schema and included all of the required fields. Looking at the Hazard examples you've gotten so far there's a few errors:

Deterministic layers examples (maps) to show documentation of index values

Figure	Metadata
	Title: Global landslide susceptibility layer Description: Deterministic map of mean landslide hazard occurrence frequency. Spatial scale: global Risk Data type: Hazard Hazard type: Landslide Source name: LHASA Source type: model Analysis type: Deterministic Frequency distribution: ~~Susceptibility~~ Calculation method: Inferred Deterministic frequency intensity measure: Index Index criteria: Combination of climatology and observed empirical events. License: Open (CC-BY)

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Susceptibility would translate to?) Unless this should actually be a different field?

Describe historical event set - see #81 (comment)
Empirical scenario footprint to show use of GLIDE number and event dates

Figure	Metadata
	Title: Satellite detected water extent Description: Satellite-detected surface waters in Shabelle Zone, Somali Region of Ethiopia and Beledweyne District, Hiraan Region of Somalia as observed from a Sentinel-2 image acquired on 14 April 2023 at 07:28 UTC. Countriest: Somalia; Ethiopia Risk Data type: Hazard Hazard type: Flood Source name: ESA Source type: model Analysis type: Empirical Calculation method: Inferred Temporal: 2023-09-04 (start); 2023-04-14 (end) Disaster identifier: FL20230327SOM License: Open (CC-BY)

Dates should be in YYY-MM-DD format.

Set of hazard maps, to show one of the most common use cases

Figure	Metadata
	Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial scale: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source name: FATHOM Source type: model Analysis type: Probabilistic Frequency distribution: ~~Return periods~~ Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Flood water depth [m] License: Commercial

'River flood' isn't in the process_type codelist, this should be 'fluvial_flood' as this codelist is closed.

Frequency distribution is a closed codelist, so it has to be either 'poisson', 'negative binomial' or 'user defined' (I wasn't sure which one Return periods would translate to?) Suspect this should actually be a different field?

Set with current and future climate projected hazard data to show how temporal objects are used

Figure	Metadata
	Title: Aqueduct flood hazard maps Description: Probabilistic maps of coastal flood hazard occurrence frequency by return period. Spatial scale: Global Risk Data type: Hazard Hazard type: Coastal flood Hazard processes: Storm surge Source name: Aqueduct Source type: model Temporal: 2015, 2030, 2050, 2080 Analysis type: Probabilistic Frequency distribution: ~~Return periods~~ Occurrence range: once in 5 to 1,000 years Calculation method: Simulated Intensity measure: Flood water depth [m] License: Open (CCY-BY)

For all the examples where analysis_type = 'Probabilistic' occurrence.probabilistic.probability.span is a required field if you're including any event level data.

matamadio · 2023-08-08T15:43:33Z

Thanks Jen, fixed examples but missing the last comment: still unsure on how I should indicate occurrence probability in the most common case (return period scenarios 1/n).

This is the case of the flood models where Analysis type: Probabilistic
E.g. the fathom dataset example: we have 3 layers in the dataset: 1/n1, 1/n2, 1/n3. The probabilistic range is 1/n1 to 1/n3, and there is no specific period span to specify.

odscjen · 2023-08-10T09:53:16Z

Sorry that final comment I had misread the schema! span is only required if you're using event.occurrence.probabilistic.probability

I think there are 2 options here:

you just use occurrence_range which sits in event_set to list all 3 probabilities. The description of this field makes it clear that it's only for probabilistic values so that should be clear to the users what the values given are.
each of the 3 values relates to a separate event within the event_set and you put the values in return_period which sits in event.occurrence.probabilistic and you don't use .probability at all.

matamadio · 2023-08-10T11:44:21Z

For the sake of quick example, I would pick option 1.

duncandewhurst · 2023-08-15T08:54:56Z

From today's check-in call with @matamadio and @odscrachel, we agreed that @matamadio will prepare examples using the spreadsheet template using only the relevant fields (i.e. not full RDLS metadata files). We can then convert those into JSON format to store in the repository which should give us the flexibility to present them in the documentation as needed (e.g. using field titles rather than JSON paths).

matamadio · 2023-08-15T17:43:29Z

Spreadsheet example about Fathon global dataset.

Figure	Metadata
	Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial

matamadio · 2023-08-17T11:48:08Z

About the example panel:

would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?

Figure	Metadata	Json schema
	Title: Global flood hazard layer Description: Probabilistic maps of flood hazard occurrence frequency by return period. Spatial extent: Global Risk Data type: Hazard Hazard type: Flood Hazard processes: Fluvial flood; Pluvial flood Source model: FATHOM Analysis type: Probabilistic Occurrence range: once in 10 to 1,000 years Calculation method: Simulated Intensity measure: Water depth [m] License: Commercial	Corresponding json

duncandewhurst · 2023-08-18T00:07:02Z

About the example panel:

* would it be possible to switch between (or show together) metadata list (or table) and the underlying json visualisation?

Yep. Given the length of some of the field values, I think it's best to show each in a separate tab. I've tested this out by adding the Fathom hazard example in #196.

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

In particular, it would be good to get your feedback on:

Whether to present the tabular format as separate tables.
Whether to include identifiers in the tabular example.

The advantages of using separate tables and including identifiers are:

the tabular examples better reflect the structure of the schema and spreadsheet
the relationship between the tabular example and the JSON example is clearer
it reduces ambiguity if fields belonging to different objects have the same title

The downside is that it makes the tabular example longer than presenting all the values in the same table and without identifiers.

If you're happy with the general approach, then I think the best workflow is for you to do the initial preparation of the examples using the spreadsheet template, we can then convert them to JSON to add to the standard repository and the pre-commit script will handle creating the human-friendly CSVs for display in the documentation. For ongoing maintenance, it will be easiest to edit the JSON files directly.

matamadio · 2023-08-18T11:21:23Z

Please take a look and let me know what you think: https://rdl-standard.readthedocs.io/en/135-examples/reference/schema/#hazard (below the schema reference table).

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I'll produce additional examples to add in the gdrive folder, nametag _docsample

matamadio · 2023-08-18T12:33:39Z

See example for exposure: built-up surface (GHS): rdls_exp-GHS_docsample.xlsx

Figure:

Note 1: different from the real example provided about Thailand, this one indicates the whole global dataset and not a derived national subset. Also attribution is different.
Note 2: needs exposure metric specification, see #194.
Note 3: there are 2 references for the same resource

matamadio · 2023-08-18T14:12:34Z

Example for Vulnerability: rdls_vln-FL_JRC

Fragility curves / damage functions

Can be used either for docs snippet and as full example.

Figure (one of many possible):

odscjen · 2023-08-21T14:02:24Z

rdls_vln-FL_JRC.xlsx

contact_point.name and creator.name missing so used Mattia's name for contract_point and the publisher.name for creator
spatial was missing, used .scale = 'global'
missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.

matamadio · 2023-08-21T16:12:34Z

Thanks for the feedback, sorry for the missing/wrong input!

Ok for using author as creator
Ok for using my data as contact point
Ok for removing global boundary boxes
Ok for full date in referenced_by
Ok for other missing details unless explained below
Let's hold on the exposure example until finalizing the [Schema] Exposure costs and metrics #194
Docsample are not necessarily meant to include resource download (not needed imho for docs schema examples)

rdls_exp-GHS-THA.xlsx: The resource.url links to a page where the default download is Download the global GHS_BUILT_S_E2030_GLOBE_R2023A_54009_100_V1_0 dataset in a single file which seems to be for 2023 not 2020 as given in resources.temporal. I couldn't figure out how to get that to change to 2020. @matamadio can you make it select 2020 or if not we can just change resources.temporal in the example to be 2023.

URL for this example to be replaced with specific resource data (zip to be hosted in GH docs/_datasamples or similar).
The full dataset includes a range of years; this specific subset is for year 2020, for Thailand extent. I could also publish on DDH, but not immediatly (need to wait project completion).

rdls_hzd-AQD.xlsx: event_set id = "2" has no hazards but this is required in the schema. I think what's happen is some confusion with the identifiers in the spreadsheet. In 'hazard_event_sets_hazards' there are 2 hazard objects both linked to event_set 1. But the events in event_set 1 only match the first of these event_set.hazards. BUT the hazards in 'hazard_event_sets_events' for 'event_sets/0/id' 2 don't match the second of the event_set.hazards, with the difference in the hazard.type, in 'hazard_event_sets_hazards' for the second hazard the .type = "flood" but in 'hazard_event_sets_events' the .type = "coastal_flood". @matamadio is the second of the event_set hazards supposed to be linked to the second event_set?

Commenting in the excel file

gazetteerEntries.id should be the actual code from the scheme, so in this case it should be 'TH' as this is the ISO 31-66-2 code for Thailand. So I've moved this from .description and replaced .description with "Thailand'.

Thanks, this needs to be explained in description. Please note this (and other country examples) uses ISO3166-1-alpha2: first level unit (country), 2 letters code.

resource.url is missing. This has been discussed previously (GFDRR/rdls-spreadsheet-template#3 (comment)) so to make the validation pass I've added some dummy url's as this is a commercial product so it's not going to be possible to provide a proper url to the actual data.

Else the url could point to the exising datacatalog page (from where resource can be requested).

events in event_set 1 are missing hazard.type and hazard_process so I've just copied them in from the event_set.hazard values. And done for the same for the other 2 event_sets and given them all local ids
no license so I just put in 'commercial' so that it'll pass validation (and this is essentially correct)

Sorry - they are all hazard type: flood; 1 and 2 process is fluvial flood, while 3 is pluvial flood.

missing required from vulnerability, .taxonomy and .spatial.scale - used 'global' for the latter. I had a quick skim through the methodology report for the resource and I couldn't work out what, if any, taxonomy they'd used for classifying the assets so I put it in as 'internal', @matamadio let me know if you know of the actual taxonomy used.

I would put taxonomy as optional here. Originally these were based on Corine Land Cover classes (CLC), but in the end they use their own general taxonomy for splitting curve types. So "internal" is ok.

duncandewhurst · 2023-08-21T21:50:52Z

rdls_exp-GHS_docsample.xlsx

* @duncandewhurst I think there must be a mistake in the template as `links.rel` is prepopulating with 'describedby' and not 'describedBy'

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

odscjen · 2023-08-22T10:57:30Z

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

odscjen · 2023-08-22T11:07:37Z

Else the url could point to the exising datacatalog page (from where resource can be requested).

this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

duncandewhurst · 2023-08-23T02:02:45Z

'describedby' is correct. It is an IANA link relation type, which are all lowercase.

ah, okay, this is getting reported as an error in every JSON conversion

Please can you share the data and command(s) that you're using in a new issue? I converted and tested rdls_hzd-AQD.xlsx using the commands in GFDRR/rdls-spreadsheet-template#4 and there were no validation errors.

odscjen · 2023-08-24T11:52:21Z

@duncandewhurst I used the flatten tool command from that issue but I was using https://www.jsonschemavalidator.net/ for the validation. The schema is definitely the current dev branch schema but I get the following error message

Message:
String 'describedby' does not match regex pattern '^(?!(describedby))'.
Schema path:
https://raw.githubusercontent.com/GFDRR/rdl-standard/0__2__0/schema/rdls_schema.json#/properties/links/items/properties/rel/pattern

duncandewhurst · 2023-08-28T03:26:31Z

Ah, so as I mentioned in the issue description:

You can also ignore the error relating to the regex pattern for links.rel. I think that's a false positive due to that validator only supporting JSON Schema draft 2019-09 so it should be resolved in CoVE, which uses draft 2020-12.

As expected, there are no errors when validating against draft 2020-12 using check-jsonschema.

stufraser1 · 2023-08-28T06:11:12Z

Else the url could point to the exising datacatalog page (from where resource can be requested).
...
this link for me just goes to a world bank login page (which I obviously can't login to) so I don't think it's an appropriate link to use as it doesn't show anything of the actual data. I think at the moment as these are just examples using a dummy url is the better option.

We need to make sure, when linking to the datacatalog, we are NOT using https://datacatalog.worldbank.org/int/search/..., which is internal only (and the default when Mat, Pierre, I copy a link, but make sure to remove the 'int/' to make it visible externally: https://datacatalog.worldbank.org/search/...

stufraser1 · 2023-08-28T06:13:38Z

Yes, I like this. Separated tables are good. Hiding identifiers would get a cleaner view of key attributes; but I agree it is good to have 1:1 representation of the json.

I agree, following the example for hazard, rather than for exposure looks much better. Easy to tab between each representation of the example, and very clear where to find the examples.

odscjen · 2023-08-29T12:44:06Z

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

stufraser1 · 2023-08-29T15:50:44Z

Set of hazard maps, to show one of the most common use cases

I also created, as a test, a sheet containing 6 zipped resources containing flood hazard map geotiffs.
I created a single event set (its a regional analysis), 6 resources, and one event per country per return period (50 events) and one footprint per event. This differs from the Fathom data example, which has one event per hazard type (3, PLU, FLU Def, FLU Undef) and no footprints. The necessary information gets across to the user either way but I'm not sure which is better.
I created it this way because that is how we've packaged it in the dataset on DDH but this is not necessarily the best way, please feel free to suggest a better way - though we're unlikely to reconfigure the dataset on DDH now.

sheet
json

duncandewhurst · 2023-08-30T03:00:19Z

With the exception of RDLS_full_SFRARR_fluvialhazardmaps.json, I've added all of the examples in the JSON conversions folder to the schema reference documentation in #196. I'm sharing a summary of key changes and design decisions below:

I updated the JSON files to reflect the latest version of the schema, but I haven't updated the spreadsheets that were used to generate them. I also corrected one semantic error in spatial.gazetteerEntries in the Central Asia exposure examples, see the commit for details: 0273914. I also put the two Central Asia exposure dataset examples in separate JSON files for ease of comprehension.

To reduce the length of the schema reference page, I've nested the examples with collapsible drop-downs.

Where there is more than one example for a component, only the first example is uncollapsed. If there is no figure for an example, it is collapsed. I couldn't find a suitable figure for the Central Asia exposure examples, but I took a screenshot from global flood depth-damage functions PDF to use as a figure for that example:

The row titles in the tabular examples now include the titles of intermediary objects so that it is possible to distinguish between, for example, publisher name and creator name (previously they were both titled 'name'):

To reduce the amount of screen space taken up by the JSON examples, they are now collapsible, with objects and arrays collapsed by default:

matamadio · 2023-08-30T10:46:30Z

Very nice, thanks.
Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

Using word wrap should fix most cases (description, details)
URLS as id might be truncated..?

matamadio · 2023-08-30T10:48:40Z

@matamadio do you have a _docsample version of the vln-FL_JRC example? Also is there one yet for Loss?

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.
The one for loss is still to be produced.

duncandewhurst · 2023-09-04T02:08:10Z

Would it be possible to limit the horizontal scroll of the table view, as in the codelists (#161)?

Addressed in #214.

The vln-FL_JRC is ok to use in docs as well, it doesn't include too many attributes anyway.

Added in #196.

I don't think there's anything else to do for this issue until the loss example is ready. Let me know if that's wrong!

duncandewhurst · 2023-09-05T08:20:45Z

@matamadio and @stufraser1 to discuss and prepare loss examples.

matamadio · 2023-09-05T15:08:42Z

One example of loss data (results of the analysis) from CCDR:

Download THA_RSK.xlsx

This represents one specific country, but the same template applies to any country I've been working on.
The dataset consists of one excel file, made of several tabs:

Classes | Legend: how to interpret the data within the file
Overview: key results summary with charts
ADM(i)_summary: exposure or impact values for all hazard summed up at the ADM level
Individual hazard scores (EAE and EAI) and the calculations behind those, for the smallest adm level
EM-DAT: disaster list for context

The tabular data for the ADM scores is also provided as geospatial (gpkg). It does not have an explicit loss curve chart, but has all the elements to build it.

@stufraser1 should it fit in the schema in the current state, or do you have any suggestion for better formatting? This is key as Im just now setting the default for the new year analytics.

stufraser1 · 2023-09-06T12:44:47Z

I would say there are sheets in there that wouldn't normally go into the loss component:

classes may be more appropriate to include in the Vulnerability component
EM-DAT and LS_Event_records are historicla catalogues and should be separate.

My preference for describing these files in RDL Loss would be to include this as a dataset, and give each sheet as its own resource (.csv), rather than an xlsx book so users can see the list of resource descriptions per dataset, rather than them navigating in many sheets, but I see it could be described in metadata using the existing structure with the workbook as a single resource.

matamadio · 2023-09-06T15:14:57Z

I have some questions about the loss schema.
See simplified CCDR output example in the Gdrive folder.

THA_CCDR_RSK_ADM1.xlsx describes loss output for 2 hazards (river floods and coastal floods) over 2 exp categories. The complete standard ouput would include 5-6 hazards and 3 exposed categories.

Metadata spreadsheet has loss attributes at the dataset level, so I have to create 4 dataset rows.

But all these information are actually in just one file.

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

stufraser1 · 2023-09-06T15:31:32Z

Good catch.
We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object.
I don't think anything else would need nesting: one level should suffice.
@odscrachel please could you advise if we can process this quickly / overnight with @duncandewhurst ?

stufraser1 · 2023-09-06T16:07:57Z

I also have a couple of issues testing with a return period dataset:

loss/impact/unit does not include impact_unit code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank
loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?
spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.
There is a mismatch in loss/cost/0/dimension and loss/cost/0/unit - dimension includes population but the unit requires a currency code.
sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs

Here is the loss metadata file for use in the loss example:
json
xlsx
image: tabulated data, so no image provided

stufraser1 · 2023-09-06T16:20:15Z

images for exposure examples:
Central Asia residential current
Central Asia residential projected

duncandewhurst · 2023-09-07T02:20:15Z

Should I use the same dataset ID all along? Or should we rather move all loss attributes into an array?

Each row in the datasets sheet represents a dataset so if there are rows with the same id, the JSON output will be single dataset with the values from the final row, i.e. the values from the earlier rows will be overwritten. Therefore, the Thailand CCR example does point to the need for an array of losses.

I've drafted a PR for the changes proposed in #135 (comment) and #135 (comment):

Make loss an array
Make loss.cost an object

@stufraser1 @matamadio I will leave it up to you to decide if you want to merge this PR for inclusion in the 0.2 release or leave it for later. My sense is that modelling for loss metadata warrants further exploration (I'll open an issue), but that the changes in the PR are an improvement over the current model so I would merge it.

I'll hold off preparing a PR to add the loss examples until we have decided what to do about the schema as if the schema changes the examples will need to be updated. I've also left some comments on the SFRARR example spreadsheet where I think some fields may have been populated incorrectly.

@stufraser1 I've shared my feedback on your other questions and suggestions below.

I also have a couple of issues testing with a return period dataset:

loss/impact/unit does not include impact_unit code for monetary losses, so where I've got a monetary asset_loss, I have to leave loss/impact/unit blank

This was discussed at some length in #75, but the conversation in that issue took a different direction so I don't think it was fully resolved.

My preferred approach is not to worry about units and instead to model the kind of quantity being measured (currency, in this case) since users can convert between units of the same quantity kind. That is the approach we settled on for exposure metrics and I think it would make sense to have consistent modelling for exposure metrics and impact metrics. However, that is quite a significant change to consider at this stage for 0.2.

The alternative solution that I proposed was to add an Impact.currency field for monetary losses. The reasons for separating unit and currency are twofold:

Completeness: The complete list of currencies is well-defined and all currencies are of more-or-less equal relevance to RDLS so it makes sense to have a comprehensive (closed) currency codelist. Whereas the complete list of non-currency units is less well defined and many non-currency units are totally irrelevant to RDLS so it makes sense to have a representative (open) codelist of the most relevant units.
Usability: There are very many currencies so it is much harder for a publisher to see which non-currency units are available if they are mixed in with the long list of currencies.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.

So the options are:

Do nothing
Add Impact.currency
Try to align the modelling of impact metrics with the modelling of exposure metrics

If needed, we can do option 2 for the 0.2 release and work on option 3 for the next release. Let me know what you want to do.

loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?

It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.

I think that this warrants further investigation, but I don't think we'll resolve it in time for 0.2.

spreadsheet template does not contain a link to the gazeteer location scheme, and the link in documentation 'The gazetteer from which the entry is drawn, from the open location gazetteers codelist.' leads to an error.

Regarding the spreadsheet template, I can see a link in the template and in the rdls_template_loss_SFRARR_eqrisk.xlsx (see below). Where is it missing from?

Good catch on the broken link in the documentation, this was because some codelists links in the schema included .html, which was working in the schema browser, but not in the schema reference tables, for some reason. I've fixed them in #244.

There is a mismatch in loss/cost/0/dimension and loss/cost/0/unit - dimension includes population but the unit requires a currency code.

This is because Cost is intended only to be used for monetary costs, but the codelist for Cost.dimension is shared with Metric.dimension.

sources/0/id is tied to dataset ID, so I can't add more than one unique source IDs

In rdls_template_loss_SFRARR_eqrisk.xlsx, it looks like you might've copy-pasted the value from the id column into the sources/0/id column, which has also copied the data validation rules. That's how copy-pasting behaves in Google Sheets and Excel unless you paste values only (Ctrl+Shift+V). Looking at the blank template in the spreadsheet template repository, there are no validation rules on the sources/0/id column.

Here is the loss metadata file for use in the loss example: json xlsx image: tabulated data, so no image provided

matamadio · 2023-09-07T08:13:37Z

images for exposure examples: Central Asia residential current Central Asia residential projected

Looks like the symbology represents ADM codes; it would be great to show attribute of exposure value with legend. If you can hook me to the dataset, I can produce those maps quickly.

We want to be able to include multiple loss curves in one dataset, which it would having a 'loss' object under the dataset level. I think this could contain also the contents of loss_cost, since I don't think a layer of nesting for loss cost is required beyond the loss object.

Agree on avoiding unnecessary nesting, it should be just one loss object/tab.

I've drafted a PR for the changes proposed in #135 (comment) and #135 (comment):
* Make `loss` an array
* Make `loss.cost` an object

I approved the change, as it already improves the usability. Would implement in 0.2, and wait next release for other refinements.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.

So the options are:
1. Do nothing
2. Add Impact.currency
3. Try to align the modelling of impact metrics with the modelling of exposure metrics

I'd say 2; most intuitively, as an optional field if impact.unit = monetary (adding it to impact_unit codelist if this doesn't break the QUDT standard). Nice to have it in 0.2 already if quickfix.

loss/approach is more relevant for vulnerability, and I think duplicates what we include in loss/impact/base_data_type - could be removed?

It seems to me that there is a lot of crossover, but also some differences. For example, the data_calculation_type codelist referenced in loss.impact.base_data_type has a code for 'observed' (Post-event observation data such as post-event damage surveys), which I interpret as indicating "actual" loss data rather than predictions or forecasts. That doesn't fit the semantics of any of the codes in the function_approach codelist referenced in loss.approach. The nearest fit is 'empirical', but it's definition mentions regression analysis, which implies predictions or forecasts rather than "actual" data.

I would:

remove base_data_type (which data is referring anyway? Hazard? Exposure? There is also loss/hazard_analysis_type for that) and keep loss/approach.
remove the word "regression" from the codelist definition. Keep it broad to apply more generally.

duncandewhurst · 2023-09-07T21:18:31Z

I've merged the loss updates PR so that we can release 0.2. The examples will need updating to reflect the schema changes and adding to the schema reference page. That can be done without needing to make another release because the examples themselves aren't normative.

The separation of currencies and non-currency units is in keeping with QUDT which is the source we're using for unit codes. It models currencies and non-currency units as separate vocabularies so we should keep them separate too in order to avoid the risk of clashing codes in the event that a currency and non-currency unit share the same code.
So the options are:
1. Do nothing
2. Add Impact.currency
3. Try to align the modelling of impact metrics with the modelling of exposure metrics
I'd say 2; most intuitively, as an optional field if impact.unit = monetary (adding it to impact_unit codelist if this doesn't break the QUDT standard). Nice to have it in 0.2 already if quickfix.

In the interest of not delaying the 0.2 release further, I've left this for now as it will require further discussion ("monetary" is a quantity kind rather than a unit)

matamadio added the Docs This issue relates to documentation label Jul 10, 2023

matamadio assigned stufraser1 and matamadio Jul 10, 2023

This was referenced Jul 11, 2023

[Docs update] Restructuring the docs #43

Closed

[Documentation] add guidance on describing historical event sets #107

Closed

matamadio mentioned this issue Aug 7, 2023

[Schema] Temporal object / period #179

Closed

matamadio pinned this issue Aug 17, 2023

matamadio mentioned this issue Aug 17, 2023

New version for testing GFDRR/rdls-spreadsheet-template#3

Closed

duncandewhurst mentioned this issue Aug 17, 2023

135 examples #196

Merged

3 tasks

odscjen mentioned this issue Aug 22, 2023

[Schema] should taxonomy be optional in vulnerability? #202

Closed

This was referenced Sep 6, 2023

rdls_schema: Fix codelist links #244

Merged

Loss updates #246

Merged

duncandewhurst mentioned this issue Sep 7, 2023

Review loss metadata modelling #247

Open

[Docs update] Examples to be included #135

[Docs update] Examples to be included #135

Comments

matamadio commented Jul 10, 2023 • edited Loading

List of examples to be produced and included in Docs

Hazard

Exposure - examples to show multiple data types

Vulnerability

Loss

matamadio commented Jul 12, 2023 • edited Loading

matamadio commented Aug 1, 2023 • edited by pzwsk Loading

Hazard examples

matamadio commented Aug 1, 2023 • edited by stufraser1 Loading

Exposure examples

matamadio commented Aug 1, 2023 • edited Loading

Vulnerability examples

matamadio commented Aug 1, 2023 • edited by stufraser1 Loading

Loss examples

matamadio commented Aug 3, 2023

duncandewhurst commented Aug 4, 2023

matamadio commented Aug 8, 2023

odscjen commented Aug 8, 2023

odscjen commented Aug 8, 2023

matamadio commented Aug 8, 2023

odscjen commented Aug 10, 2023

matamadio commented Aug 10, 2023 • edited Loading

duncandewhurst commented Aug 15, 2023

matamadio commented Aug 15, 2023

matamadio commented Aug 17, 2023

duncandewhurst commented Aug 18, 2023

matamadio commented Aug 18, 2023 • edited Loading

matamadio commented Aug 18, 2023 • edited Loading

matamadio commented Aug 18, 2023 • edited Loading

odscjen commented Aug 21, 2023

matamadio commented Aug 21, 2023 • edited Loading

duncandewhurst commented Aug 21, 2023

odscjen commented Aug 22, 2023

odscjen commented Aug 22, 2023

duncandewhurst commented Aug 23, 2023

odscjen commented Aug 24, 2023

duncandewhurst commented Aug 28, 2023

stufraser1 commented Aug 28, 2023 • edited Loading

stufraser1 commented Aug 28, 2023

odscjen commented Aug 29, 2023

stufraser1 commented Aug 29, 2023 • edited Loading

duncandewhurst commented Aug 30, 2023

matamadio commented Aug 30, 2023 • edited Loading

matamadio commented Aug 30, 2023 • edited Loading

duncandewhurst commented Sep 4, 2023

duncandewhurst commented Sep 5, 2023

matamadio commented Sep 5, 2023 • edited Loading

stufraser1 commented Sep 6, 2023

matamadio commented Sep 6, 2023 • edited Loading

stufraser1 commented Sep 6, 2023

stufraser1 commented Sep 6, 2023 • edited Loading

stufraser1 commented Sep 6, 2023

duncandewhurst commented Sep 7, 2023 • edited Loading

matamadio commented Sep 7, 2023

duncandewhurst commented Sep 7, 2023

matamadio commented Jul 10, 2023 •

edited

Loading

matamadio commented Jul 12, 2023 •

edited

Loading

matamadio commented Aug 1, 2023 •

edited by pzwsk

Loading

matamadio commented Aug 1, 2023 •

edited by stufraser1

Loading

matamadio commented Aug 1, 2023 •

edited

Loading

matamadio commented Aug 1, 2023 •

edited by stufraser1

Loading

matamadio commented Aug 10, 2023 •

edited

Loading

matamadio commented Aug 18, 2023 •

edited

Loading

matamadio commented Aug 18, 2023 •

edited

Loading

matamadio commented Aug 18, 2023 •

edited

Loading

matamadio commented Aug 21, 2023 •

edited

Loading

stufraser1 commented Aug 28, 2023 •

edited

Loading

stufraser1 commented Aug 29, 2023 •

edited

Loading

matamadio commented Aug 30, 2023 •

edited

Loading

matamadio commented Aug 30, 2023 •

edited

Loading

matamadio commented Sep 5, 2023 •

edited

Loading

matamadio commented Sep 6, 2023 •

edited

Loading

stufraser1 commented Sep 6, 2023 •

edited

Loading

duncandewhurst commented Sep 7, 2023 •

edited

Loading