# Building flexible models

The RiskScape Platform makes it easy for other people to run your models and view the results.
However, GIS data can vary a lot in terms of its content, and so it can be a little fiddly to
structure your model in ways that make it easy for users to customize the input data.
This page contains some strategies for expert users to build models in ways that make
it flexible for users to come along and customize the model input data.

## Documentation

To start with, document your model well including how your model works and any assumptions
it makes. More details are better than less, especially for a lay audience who might
be unfamiliar with risk modelling concepts.

The simplest approach is to put these details into a PDF, upload the PDF to your
RiskScape Platform project's file storage, and then link to the PDF from your model or project's
description.

The descriptions used for RiskScape projects, models, and parameters can be markdown.
So as well as basic text formatting (e.g. \__italics_\_ or **\*\*bold\*\***),
these descriptions can include HTTP links like this:

```
[display text](https://link)
```

## Bring your own input data

To create a versatile RiskScape model that can handle a variety of different
exposure-layer input files, try following these steps when building your model:

1. Try to minimize how much your pipeline explicitly accesses exposure-layer attributes by name.
There will always be some attributes that your risk function relies on,
but try to avoid unnecessarily accessing exposure-layer attributes.
For example, don't rename exposure-layer attributes in a pipeline `select()` step - do this in a bookmark instead.

2. Define a `[type]` that contains the required attributes that your model relies on
(i.e. the attributes that the risk function relies on). Make sure you _exclude_ any geometry attributes
from this type definition.

3. Use the :ref:`parameter properties <ui-parameters>` `bookmark, type: YOUR_TYPE` for your model parameter.
If a user tries to use input data that doesn't match your type,
then this will bring up a UI widget to help the user pick out the correct attributes to use.
In your `project.ini` file, the parameter definition would look something like this:

    ```ini
    [parameter exposure_layer]
    properties = bookmark, type: Building_Type
    ...
    ```
    
4. In your pipeline, use `normalize_geometry()` to ensure that the input data is geospatial.
This also ensures the input data will always end up using a consistent geometry attribute name,
which is helpful when segmenting roads or other large geometry.

    ```
    select({
             *,
             normalize_geometry(exposure, 'geom', { message:
                    'The given $exposure_layer data is not a geospatial layer - it contains no geometry.'
                }) as exposure
           })
    ```

5. Define bookmarks and parameter `choices` for commonly used input layers.
If your project contains input data that users will frequently want to use, then make sure
there are bookmarks defined, with the attributes mapped to ones that the model expects.
Add these bookmark IDs as `choices` to the exposure-layer parameter, so it's easy for
users to select them, e.g.

    ```ini
    [parameter exposure_layer]
    properties = bookmark, type: Building_Type
    choices = 'Residential_buildings'
    choices = 'Nonresidential_buildings'
    ```

.. tip::
    Accepting flexible CSV input data can be a little awkward.
    If a CSV file contains latitude and longitude columns (i.e. called ``lat`` and ``long`` or similar),
    then RiskScape will automatically turn that into a geospatial layer with WGS84 point data.
    For a CSV file containing WKT, you will probably need to create a bookmark for it.
    Alternatively, if the model parameter is *only* accepting CSV files that will *always* be
    in the exact same format, you could use the ``bookmark-template`` parameter property.

### Validating input data *values*

Note that the `bookmark, type:` and Platform UI widget only ensures that attributes with the correct _name_ are present.
The Platform doesn't check that the attribute in the input data contains the correct *values*.
For example, say your model accepts a `Foundation` attribute that could be either 'Slab'
or 'Piles'. The model may not work as intended if the user supplies data with `Foundation` values like
'Concrete' or 'Timber' or 'NULL'.

The most flexible approach here is to use a Python function to process the exposure-layer data and check
the values are correct. If invalid values are present, then you could replace the attribute value with a
more suitable default value, and return the updated feature. For example, say you had a `Sanitize_Building_Attributes()`
Python function that returned your required `Building_Type` with attribute values updated to appropriate values,
then you could use it in a model like this:

```
select({ *, merge(exposure, Sanitize_Building_Attributes(exposure)) as exposure })
```

Another alternative is to use stricter RiskScape types, such as `set`, `range`, or `enum`.
You could then use the `cast()` function to ensure the input data matches the correct type, e.g.

```
select({
         *,
         merge(exposure, {
                           Foundation: cast(exposure.Foundation, 'Foundation_Type')
                         }) as exposure
       })
```

This approach is less forgiving, however, as the model will exit with an error as soon as it
encounters a non-conforming `Foundation_Type` value.

## Different asset classes

Your modelling scenario may need to handle a variety of asset types, such as
buildings, road, population, and pipes. There are a couple of approaches you could use here:

1. Build a separate model for each asset class, e.g. `Population-Exposure`, `Building-Loss`, `Road-Loss`, etc.
This is the simplest approach, as you can then tailor the input attribute types to suit the model data.
You could potentially reuse the underlying pipeline, or parts of the pipeline (i.e. sub-pipelines),
between the models.

2. Build a single model to handle *any* asset class. This can be useful if you already have several different
*hazard* scenarios you want to model, such as flood, earthquake, tsunami. Building a separate model
for each permutation would quickly get unmanageable.

Here are some tips for taking the latter approach:

- Add some sort of `Asset_type` attribute to keep track of what type of asset the model run is dealing with.
This could either be a model parameter or a constant value in the bookmarks.
Including this attribute in the exposure-layer bookmark means it will be less likely the model gets run with
an inconsistent combination of parameters. For example:

```ini
[bookmark Buildings]
location = buildings.shp
set-attribute.Asset_type = const('Building')
```

- If you use `const()` conditions with `if()` lambdas then you can change what data the model reports on the fly.
Normally the `then:` and `else:` cases in an `if()` need to be the *same* type, but *not* if the condition is constant, e.g.

```
if($asset_type = 'Road',
   then: () -> { measure(exposure) / 1000 as Exposed_km },
   else: () -> { expsoure.Value as Exposed_Value_NZD })
```

- Your loss or risk function may need to "fan out" based on the asset type, so that the road loss Python code is
used for roads, the building loss code gets used for buildings, and so on. You can do this using an `if/else`
Python block, or a RiskScape expression function that uses nested `if()` expressions.
RiskScape expression functions allow for flexible return types (i.e. the return-type can changed based on the input asset type),
whereas the Python return-type would need to be the same across all asset types.

- If you are using `if()` to change the shape of your results data on the fly, then you can use `map_struct()`
to manipulate attributes without hard-coding specific attribute names into your pipeline. 
Refer to the [engine documentation](https://engine-docs.sites.riskscape.nz/reference/data-manipulation.html)
for more tips on this.

- You may need to create a 'unified' asset type that is suitable for any exposure-layer input,
particularly if users need to supply their own input data.
For example, you could use generic attribute names, such as `Material` or `Use_category`, that can apply to different
input datasets, but would hold different *values* for each dataset.
For example, a `Material` attribute might represent the construction type for buildings,
whereas for for roads it represents sealed/unsealed, and for pipes it's ductile/brittle.

## Multiple different asset classes

Instead of running your model against any asset dataset one-at-a-time, you may want to
run it over *all* asset datasets and produce a single set of results.

Some of the previous section will also apply here, although this approach gives you
a little more flexibility in that you don't necessarily have to define a single type for all asset classes.

- You would use a `union()` pipeline step to combine the various input layers.
Note that attributes that are only present for one layer will be null for other layers.

- You would need to use `normalize_geometry()` *before* the union step,
otherwise you could end up with multiple geometry attributes. You could use a
`subpipeline()` step to avoid duplicating pipeline code.

- Often you will want to 'bucket' the results based on asset type, so that you can
see the total road loss separately to building losses, and so on.
There's a couple of approaches you can take there:
  - Use `bucket()` in your `group()` steps, as per normal.
    One trick is to use sub-pipelines, so you don't have to repeat the bucket logic all the time.
    You can make the group `by:` condition a sub-pipeline parameter, and so you can re-use the
    bucket code for regional/national/use-category aggregation simply by changing the
    sub-pipeline parameter. 
  - Alternatively, you can use 'scalar bucketing' to categorize the model results into
    the desired format *before* aggregation. Then in your aggregation step you can simply 'sum'
    the bucketed struct (or use `map_struct()` to calculate the AAL). There's an example
    of this scalar categorization approach
    [here](https://github.com/GNS-Science/riskscape/blob/1.13.0/subpipelines/examples/project.ini#L148).

- For performance reasons, you may sometimes want to exclude certain asset types so that the model
runs faster. One approach is to use a filter step, e.g.
    ```
    filter(switch(exposure.Asset_type, default: false, cases: [{ in: $include_assets, return: true }]))
    ```
  This is a useful approach when the input data is fixed and pre-defined (i.e. the user can't supply their own custom input data).
  This works well in combination with a checkbox or multi-select parameter (i.e. `$include_assets`).

  Another alternative is to define an 'empty' input relation. This works better when the user
  can provide their own input data - you can add a 'No Data' drop-down option that essentially
  lets the user skip that asset type completely.

  It can be a little awkward to produce a completely empty geospatial dataset.
  One way to do this is to create a small file with the required attributes, and then
  just add `filter = false` to the bookmark.