In many aspects of life information is available and required in a form of categorical data. For example, in a city neighbourhood or a district each building has assigned a function (housing, office, etc.). Depending on the building function the city council may impose different tax levels.
In this simple example, in a neighbourhood the function of a building is assigned depending on the building percentage of area inside the building that is used for living and number of addresses inside each building. If the residential area is higher than 80% and has at least one address registered then the building is classified as “residential”. If the residential area is less than 80% and at least one address is present the building is classified as “office”. If no address is present the building function is assumed as “other”. Depending of these categories the city council imposes tax levels of €1000, €10000, €10 per year, respectively. The 80% threshold is approximate though and some buildings classified as “others” could have got the address assigned but not put on a system yet, therefore the council wants to calculate the uncertainty in tax revenue depending on errors introduced by erroneous building classification.
The adapted uncertainty propagation analysis approach is based on the Monte Carlo method that computes the output of the model repeatedly, with input values that are randomly sampled from their non-parameteric pdfs. The set of model outputs forms a random sample from the output pdf. The method thus consists of the following steps:
The example data for tax calculation contain the spatial distribution of buildings in a neighbourhood in Rotterdam, NL.
# load packages
library(sp)
library(spup)
library(purrr)
library(png)
set.seed(12345)
# load data
data(woon)
# Netherlands contour and Rtterdam location
NL <- readPNG("RotterdamNL.png")
# collate info about fugure size and type
NL_map <- list("grid.raster", NL, x = 89987, y = 436047, width = 120,
height = 152, default.units = 'native')
# collate info about a scale bar location in the figure, type and colour
scale <- list("SpatialPolygonsRescale", layout.scale.bar(),
offset = c(90000,435600), scale = 100, fill=c("transparent","black"))
# collate info about minimum value on a scale bar
text1 <- list("sp.text", c(89990,435600), "0", cex = 0.8)
# collate info about maximum value on a scale bar
text2 <- list("sp.text", c(90130,435600), "500 m", cex = 0.8)
# collate info about North arrow
arrow <- list("SpatialPolygonsRescale", layout.north.arrow(),
offset = c(89990,435630), scale = 50)
# plot 'woon' object with a location miniature, North arrow and scale bar defined above
spplot(woon, "check", do.log = TRUE, col.regions = "transparent",
colorkey = FALSE, key.space = list(x = 0.1, y = 0.93, corner = c(0,1)),
sp.layout = list(scale, text1, text2 ,arrow, NL_map),
main = "Neighbourhood in Rotterdam, NL")
The ‘woon’ object is a SpatialPolygonDataFrame where each building is represented by one polygon. The attributes contain:
In the next section we explain how we use the information about the probabilities.
In case of categorical variables the uncertainty is described by use of non-parameteric pdfs. In our case we describe it by a vector of probabilities that a building belongs to a certain category. In case of spatially distributed variables we have to do this for each polygon, hence the dataset has maps of these probabilities saved in the same object.
To unite all information of the uncertainty model for the building function we use the defineUM()
function that collates all information into one object. The minimum information required is:
# define uncertainty model for the bulding function
woonUM <- defineUM(TRUE, categories = c(1,2,3), cat_prob = woon[, c(4:6)])
class(woonUM)
[1] "MarginalCategoricalSpatial"
The argument cat_prob
is the probabilities vector. In case of spatial vatiables for each polygon a vector of probabilities is collated into a data frame (as above). Foreach polygon, the probabilities vector should sum to one:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 1 1 1
We can have a look at how the probabilities for each polygon look like:
Generating possible realities of the building function can be completed by using the genSample()
function. The required information to pass to the function includes:
Usually the sample must be large to obtain stable results. Let us run the sampling to obtain 10 realizations. Note the artument ‘asList’ has been set up to FALSE. This indicates that the sampling function will return an object of a class of the distribution parameters class. This is useful if you want to visualize the sample or compute summary statistics quickly.
# create possible realizations of the building function
woon_sample <- genSample(woonUM, 10, asList = FALSE)
class(woon_sample)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
# view several realizations
spplot(woon_sample[c(3,4,1,2)], main = list(label = "Examples of building function realizations", cex = 1))
In order to perform uncertainty propagation analysis using ‘spup’, the model through which uncertainty is propagated needs to be defined as an R function, for example:
tax <- function(building_Function) {
building_Function$tax2pay <- NA
building_Function$tax2pay[building_Function$Function == 1] <- 1000
building_Function$tax2pay[building_Function$Function == 2] <- 10000
building_Function$tax2pay[building_Function$Function == 3] <- 10
total_tax <- sum(building_Function$tax2pay)
total_tax
}
Note that in a given the model has a requirement that the building function is stored in a SpatialPolygonDataFrame and in a column named ‘Function’.
The propagation of uncertainty occurs when the model is run with an uncertain input. Running the model with a sample of realizations of uncertain input variable(s) yields an equally large sample of model outputs that can be further analyzed. To run the tax
model with the building function realizations we use the propagate()
function. The propagate()
function takes as arguments:
In order to run the propagation function the sample of an uncertain input variable must be saved in a list. We can either coerce the existing woon_sample object or get it automatically setting up the ‘asList’ argument of genSample()
to TRUE.
# coerce SpatialPolygonDataFrame to a list of individual SpatialPolygonDataFrames
woon_sample <- map(1:ncol(woon_sample), function(x){woon_sample[x]})
# or sample from uncertain input and save it in a list
woon_sample <- genSample(UMobject = woonUM, n = 10, asList = TRUE)
class(woon_sample)
[1] "list"
Earlier we noted that the construction of the model requires that the information of the building function is stored in a column named ‘Function’. By default the sampling function returns ralizations named ‘sim1’, ‘sim2’, and so on, so we need to adjust that manualy.
Finally, run the propagation:
[[1]]
[1] 2227530
[[2]]
[1] 2355420
[[3]]
[1] 2197560
[[4]]
[1] 2433360
[[5]]
[1] 2334450
[[6]]
[1] 2378370
[[7]]
[1] 2361360
[[8]]
[1] 2365410
[[9]]
[1] 2336610
[[10]]
[1] 2194680
Min. 1st Qu. Median Mean 3rd Qu. Max.
2194680 2254260 2346015 2318475 2364398 2433360
We can see that on average the city council should obtain the tax revenue of €2300000, but depending on the building classification this amount may vary by €120000.
The Rotterdam neighbourhood dataset was obtained from the Dutch cadastral dataset BAG. We would like to thank Filip Biljecki for help with obtaining the data.
This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 607000.