Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Schema] Exposure costs and metrics #194

Closed
duncandewhurst opened this issue Aug 16, 2023 · 12 comments · Fixed by #204
Closed

[Schema] Exposure costs and metrics #194

duncandewhurst opened this issue Aug 16, 2023 · 12 comments · Fixed by #204
Assignees

Comments

@duncandewhurst
Copy link
Contributor

From GFDRR/rdls-spreadsheet-template#3 (comment):

  1. Why is the exposure_cost sheet populated? If I understood correctly, the dataset doesn't describe the cost of buildings, it only describes their area.

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value; in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms. In this specific case it is a value of built-up area over total pixel area. In other cases, the value could be building height, or volume, population density or others. A range of different metrics could be represented by exposure, in order to measure the cost.

I see two options:

1. Put `cost` field as optional, use it only if actually a currency value. Don't specify exposure metric.

2. Add exposure `metric` field as open codelist

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

@matamadio
Copy link
Contributor

matamadio commented Aug 17, 2023

@matamadio do analysts need to know the exposure metric at the point of selecting a dataset?

To me it is one of the most key information to provide for exposure; similarly to hazard imt. It doesn't need to be within "cost" array, it can be at top level as exposure/metric

@matamadio matamadio changed the title Exposure costs and metrics [Schema] Exposure costs and metrics Aug 17, 2023
@odscjen
Copy link
Contributor

odscjen commented Aug 21, 2023

This sounds as though we need a new object in addition to exposure.cost, e.g.

"metrics": {
          "title": "Asset metrics",
          "type": "array",
          "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset. If a metric is measured exclusively in monetary values use `cost`.",
          "items": {
            "$ref": "#/$defs/Metric"
          },
          "minItems": 1,
          "uniqueItems": true
        }

where Metric is

"Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "unit"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the metric, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "unit": {
          "title": "Metric unit",
          "type": "string",
          "description": "The unit in which the metric is specified, from the open [impact_unit codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#impact_unit.",
          "codelist": "impact_unit.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }

Is this object likely to be potentially needed anywhere else? If not it doesn't need to be in $defs and can just go straight into exposure.

I think though if we go with this we'll need to revise some of the codelist names, rename 'cost_type.csv' to 'asset_type.csv' and rename 'impact_unit.csv' to 'metric_unit.csv'. My logic for the second of these is that impact is a specific type of metric but happy to have alternative names for this one suggested. Or alternatively @matamadio @stufraser1 is 'impact_unit.csv' not appropriate for this exposure metric and do we need an entirely new codelist for this field?

@matamadio
Copy link
Contributor

matamadio commented Aug 21, 2023

Thanks for the proposal; I made a counterproposal splitting metric into 2 arrays:

Exposure

  • category
  • taxonomy
  • metric
    • monetary (cost)
      • type (as is)
      • unit (as is) - separate from vulnerability/cost
    • non-monetary
      • type (new codelist)
      • unit (new codelist)

If this makes sense:

  • rename cost as monetary (also codelist monetary_type.csv and monetary_unit.csv)
    • full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units
  • add new array non-monetary and associated type and unit open codelists (nonmonetary_type.csv and nonmonetary_unit.csv)
    • nonmonetary_type.csv same as monetary_type.csv with the inclusion of "population"
    • nonmonetary_unit.csv as open codelist, existing values:
      • Area (extent)
      • Count
      • Density
      • Time (period)
      • ...more to add
    • when cost type = disruption, user might need to quantify it in terms of production time rather the monetary

@duncandewhurst
Copy link
Contributor Author

Thanks, both. I'll have a think about modelling options.

@duncandewhurst duncandewhurst self-assigned this Aug 22, 2023
@duncandewhurst
Copy link
Contributor Author

full list of currencies as unit is ok, but realistically we would need something like USD (year), PPP (year), and similar comparable units

  1. Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?
  2. Are you suggesting that we add a field for the value date of the monetary amounts in a dataset?
  • nonmonetary_unit.csv as open codelist, existing values:

    • Area (extent)
    • Count
    • Density
    • Time (period)
    • ...more to add

As discussed in #75 (comment), these are quantity kinds rather than units. Units would be things like square metres (for area quantities) or hours (for time quantities). I agree that it is more useful to model quantity kinds than specific units, since it should be possible to convert between units within a quantity kind (e.g. hours to minutes), but not between units of different quantity times (e.g. square metres to hours). I would name this field accordingly (quantityKind) and base it on a subset of the QUDT quantity kinds vocabulary, which already has codes for Area, Count, Density, Time and Currency.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume so it would be good to work out what the correct quantity kind is.

@stufraser1
Copy link
Member

stufraser1 commented Aug 22, 2023

At the moment, the "cost" of exposure is limited to monetary currencies. But the value represented by an exposure dataset might be intangible, or just a proxy to later calculate the economic value

Agree - number of buildings / number of people / km of roads (e.g. per grid cell) are commonly used as well as total value (replacement cost / insured value) per grid cell or per building.

in my experience it is actually pretty uncommon to use an exposure dataset that already comes into economic terms.

It is common in national level datasets and some global datasets, but maybe not in the ones used in examples so far. See Central Asia datasets, Africa R5, GEM's global exposure model, as just a few examples. It is also the case as stated that the value might be area or length or count.

To me it is one of the most key information to provide for exposure

Agree -- cost type or (monetary/non-monetary) value of the exposure needs to be readily visible in metadata.

Can you share a link to a dataset in which the exposure metric is expressed as a quantity of density? I'm assuming you don't mean the QUDT definition of density, which is mass per unit volume

This refers more to population density - relating number in a given geographic area. 'Count' would cover this - number of building / population, which would be given in the data as a count per raster grid cell. I haven't yet seen an exposure dataset with the value given as 'no. buildings per km2'.

I think the suggestion from @matamadio works to make it clearer that we can include monetary and non-monetary values and the latter should include Area and Count. I don't think we need Time/Duration here as a metric. In my experience exposure isn't ever given a time value. We might estimate the disruption time as a loss, or (for insurance datasets only) identify an insured value for business interruption for a building, but we wouldn't record a unit of time in the exposure dataset - I can't think of an example where a road or building would be attributed a time value - it wouldn't mean anything practically.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

@matamadio
Copy link
Contributor

Does PPP stand for purchasing power parities in this context? If so, PPPs seem more like conversion rates than units. Can you share a link to a dataset in which the exposure metric is expressed in purchasing power parities?

Yes, sometimes costs are expressed as PPP of local currency into USD. Anyway, not strictly necessary.

I would request that the data structure allows one or more of count, area AND cost to be included in the same dataset - I can point to examples where the cost is derived from one or both of area and count, and all pieces of data are included in the final data.

Agree on this solution.

@odscjen
Copy link
Contributor

odscjen commented Aug 22, 2023

Great so combining all of this we could remove exposure.cost and replace it with exposure.metrics which would be an object holding 2 arrays, one of monetary metrics and one of non-monetary metrics. This would allow for multiple metrics to be included for a single dataset. We could keep using Cost as the monetary items (which is good as we still use Cost in Loss as well) and add an additional $defs/Metric for the non-monetary metric items.

{
  "metrics": {
    "title": "Asset metrics",
    "type": "object",
    "description": "The metrics associated with specific elements of assets detailed in the dataset.",
    "properties": {
      "monetary": {
        "title": "Monetary asset metrics",
        "type": "array",
        "description": "The monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Cost"
        },
        "minItems": 1,
        "uniqueItems": true
      },
      "non_monetary": {
        "title": "Non-monetary asset metrics",
        "type": "array",
        "description": "The non-monetary exposure metrics associated with specific elements of assets detailed in the dataset.",
        "items": {
          "$ref": "#/$defs/Metric"
        },
        "minItems": 1,
        "uniqueItems": true
      }
    },
    "minProperties": 1
  }
}
{
  "$defs":{
    "Metric": {
      "title": "Asset metric",
      "type": "object",
      "description": "The metric associated with specific elements of assets detailed in the dataset.",
      "required": [
        "id",
        "type",
        "quantity_kind"
      ],
      "properties": {
        "id": {
          "title": "Identifier",
          "type": "string",
          "description": "A locally unique identifier for this metric.",
          "minLength": 1
        },
        "type": {
          "title": "Metric type",
          "description": "The type of the asset, from the closed [cost type codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#cost_type).",
          "type": "string",
          "codelist": "cost_type.csv",
          "openCodelist": false,
          "enum": [
            "structure",
            "content",
            "product",
            "disruption"
          ]
        },
        "quantity_kind": {
          "title": "Quantity kind",
          "type": "string",
          "description": "The kind of quantity in which the metric is specified, from the open [quantity kind codelist](https://rdl-standard.readthedocs.io/en/{{version}}/reference/codelists/#quantity_kind.",
          "codelist": "quantity_kind.csv",
          "openCodelist": true
        }
      },
      "minProperties": 1
    }
  }
}

with the quantity kind codes taken as the most relevant selection in the QUDT quantity kinds vocabulary

Code Title
area Area
count Count
length Length

One issue with this is that as it stands there isn't a way of expressing that a metric is relating to a population. Does it need to be added to the cost_type codelist? And would it make sense to rename this codelist to metric_type?

@matamadio
Copy link
Contributor

Thanks Jen. Agree on the solution, including renaming as metric_type and including population.

@duncandewhurst
Copy link
Contributor Author

It seems to me that there are more similarities than differences between monetary and non-monetary metrics so I would lean towards having a single metrics array.

What are the advantages of separating monetary and non-monetary metrics in the data model instead of having a single metrics array and using the quantity_kind field (with 'Currency' as an option) as a discriminator?

From a general usability point of view, there are some advantages to having a single metrics array: fewer sheets in the spreadsheet representation and I would've thought it would be easier for users to see all of the metrics in a dataset in a single list/table/sheet than to have them split into separate lists.

However, happy to hear if there is a risk-specific reason for separating them!

@stufraser1
Copy link
Member

This does seem easier to use and communicate range of metrics and I don't think there is a need to have them in two lists/array

@matamadio
Copy link
Contributor

Ok for single array grouping based on quantity_kind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

4 participants