Building flexible models

The RiskScape Platform makes it easy for other people to run your models and view the results. However, GIS data can vary a lot in terms of its content, and so it can be a little fiddly to structure your model in ways that make it easy for users to customize the input data. This page contains some strategies for expert users to build models in ways that make it flexible for users to come along and customize the model input data.

Documentation

To start with, document your model well including how your model works and any assumptions it makes. More details are better than less, especially for a lay audience who might be unfamiliar with risk modelling concepts.

The simplest approach is to put these details into a PDF, upload the PDF to your RiskScape Platform project’s file storage, and then link to the PDF from your model or project’s description.

The descriptions used for RiskScape projects, models, and parameters can be markdown. So as well as basic text formatting (e.g. _italics_ or **bold**), these descriptions can include HTTP links like this:

[display text](https://link)

Bring your own input data

To create a versatile RiskScape model that can handle a variety of different exposure-layer input files, try following these steps when building your model:

  1. Try to minimize how much your pipeline explicitly accesses exposure-layer attributes by name. There will always be some attributes that your risk function relies on, but try to avoid unnecessarily accessing exposure-layer attributes. For example, don’t rename exposure-layer attributes in a pipeline select() step - do this in a bookmark instead.

  2. Define a [type] that contains the required attributes that your model relies on (i.e. the attributes that the risk function relies on). Make sure you exclude any geometry attributes from this type definition.

  3. Use the parameter properties bookmark, type: YOUR_TYPE for your model parameter. If a user tries to use input data that doesn’t match your type, then this will bring up a UI widget to help the user pick out the correct attributes to use. In your project.ini file, the parameter definition would look something like this:

    [parameter exposure_layer]
    properties = bookmark, type: Building_Type
    ...
    
  4. In your pipeline, use normalize_geometry() to ensure that the input data is geospatial. This also ensures the input data will always end up using a consistent geometry attribute name, which is helpful when segmenting roads or other large geometry.

    select({
             *,
             normalize_geometry(exposure, 'geom', { message:
                    'The given $exposure_layer data is not a geospatial layer - it contains no geometry.'
                }) as exposure
           })
    
  5. Define bookmarks and parameter choices for commonly used input layers. If your project contains input data that users will frequently want to use, then make sure there are bookmarks defined, with the attributes mapped to ones that the model expects. Add these bookmark IDs as choices to the exposure-layer parameter, so it’s easy for users to select them, e.g.

    [parameter exposure_layer]
    properties = bookmark, type: Building_Type
    choices = 'Residential_buildings'
    choices = 'Nonresidential_buildings'
    

Tip

Accepting flexible CSV input data can be a little awkward. If a CSV file contains latitude and longitude columns (i.e. called lat and long or similar), then RiskScape will automatically turn that into a geospatial layer with WGS84 point data. For a CSV file containing WKT, you will probably need to create a bookmark for it. Alternatively, if the model parameter is only accepting CSV files that will always be in the exact same format, you could use the bookmark-template parameter property.

Validating input data values

Note that the bookmark, type: and Platform UI widget only ensures that attributes with the correct name are present. The Platform doesn’t check that the attribute in the input data contains the correct values. For example, say your model accepts a Foundation attribute that could be either ‘Slab’ or ‘Piles’. The model may not work as intended if the user supplies data with Foundation values like ‘Concrete’ or ‘Timber’ or ‘NULL’.

The most flexible approach here is to use a Python function to process the exposure-layer data and check the values are correct. If invalid values are present, then you could replace the attribute value with a more suitable default value, and return the updated feature. For example, say you had a Sanitize_Building_Attributes() Python function that returned your required Building_Type with attribute values updated to appropriate values, then you could use it in a model like this:

select({ *, merge(exposure, Sanitize_Building_Attributes(exposure)) as exposure })

Another alternative is to use stricter RiskScape types, such as set, range, or enum. You could then use the cast() function to ensure the input data matches the correct type, e.g.

select({
         *,
         merge(exposure, {
                           Foundation: cast(exposure.Foundation, 'Foundation_Type')
                         }) as exposure
       })

This approach is less forgiving, however, as the model will exit with an error as soon as it encounters a non-conforming Foundation_Type value.

Different asset classes

Your modelling scenario may need to handle a variety of asset types, such as buildings, road, population, and pipes. There are a couple of approaches you could use here:

  1. Build a separate model for each asset class, e.g. Population-Exposure, Building-Loss, Road-Loss, etc. This is the simplest approach, as you can then tailor the input attribute types to suit the model data. You could potentially reuse the underlying pipeline, or parts of the pipeline (i.e. sub-pipelines), between the models.

  2. Build a single model to handle any asset class. This can be useful if you already have several different hazard scenarios you want to model, such as flood, earthquake, tsunami. Building a separate model for each permutation would quickly get unmanageable.

Here are some tips for taking the latter approach:

  • Add some sort of Asset_type attribute to keep track of what type of asset the model run is dealing with. This could either be a model parameter or a constant value in the bookmarks. Including this attribute in the exposure-layer bookmark means it will be less likely the model gets run with an inconsistent combination of parameters. For example:

[bookmark Buildings]
location = buildings.shp
set-attribute.Asset_type = const('Building')
  • If you use const() conditions with if() lambdas then you can change what data the model reports on the fly. Normally the then: and else: cases in an if() need to be the same type, but not if the condition is constant, e.g.

if($asset_type = 'Road',
   then: () -> { measure(exposure) / 1000 as Exposed_km },
   else: () -> { expsoure.Value as Exposed_Value_NZD })
  • Your loss or risk function may need to “fan out” based on the asset type, so that the road loss Python code is used for roads, the building loss code gets used for buildings, and so on. You can do this using an if/else Python block, or a RiskScape expression function that uses nested if() expressions. RiskScape expression functions allow for flexible return types (i.e. the return-type can changed based on the input asset type), whereas the Python return-type would need to be the same across all asset types.

  • If you are using if() to change the shape of your results data on the fly, then you can use map_struct() to manipulate attributes without hard-coding specific attribute names into your pipeline. Refer to the engine documentation for more tips on this.

  • You may need to create a ‘unified’ asset type that is suitable for any exposure-layer input, particularly if users need to supply their own input data. For example, you could use generic attribute names, such as Material or Use_category, that can apply to different input datasets, but would hold different values for each dataset. For example, a Material attribute might represent the construction type for buildings, whereas for for roads it represents sealed/unsealed, and for pipes it’s ductile/brittle.

Multiple different asset classes

Instead of running your model against any asset dataset one-at-a-time, you may want to run it over all asset datasets and produce a single set of results.

Some of the previous section will also apply here, although this approach gives you a little more flexibility in that you don’t necessarily have to define a single type for all asset classes.

  • You would use a union() pipeline step to combine the various input layers. Note that attributes that are only present for one layer will be null for other layers.

  • You would need to use normalize_geometry() before the union step, otherwise you could end up with multiple geometry attributes. You could use a subpipeline() step to avoid duplicating pipeline code.

  • Often you will want to ‘bucket’ the results based on asset type, so that you can see the total road loss separately to building losses, and so on. There’s a couple of approaches you can take there:

    • Use bucket() in your group() steps, as per normal. One trick is to use sub-pipelines, so you don’t have to repeat the bucket logic all the time. You can make the group by: condition a sub-pipeline parameter, and so you can re-use the bucket code for regional/national/use-category aggregation simply by changing the sub-pipeline parameter.

    • Alternatively, you can use ‘scalar bucketing’ to categorize the model results into the desired format before aggregation. Then in your aggregation step you can simply ‘sum’ the bucketed struct (or use map_struct() to calculate the AAL). There’s an example of this scalar categorization approach here.

  • For performance reasons, you may sometimes want to exclude certain asset types so that the model runs faster. One approach is to use a filter step, e.g.

    filter(switch(exposure.Asset_type, default: false, cases: [{ in: $include_assets, return: true }]))
    

    This is a useful approach when the input data is fixed and pre-defined (i.e. the user can’t supply their own custom input data). This works well in combination with a checkbox or multi-select parameter (i.e. $include_assets).

    Another alternative is to define an ‘empty’ input relation. This works better when the user can provide their own input data - you can add a ‘No Data’ drop-down option that essentially lets the user skip that asset type completely.

    It can be a little awkward to produce a completely empty geospatial dataset. One way to do this is to create a small file with the required attributes, and then just add filter = false to the bookmark.