The ‘gfiExtremes’ package offers two main functions:
gfigpd1
and gfigpd2
. Each of them generates
simulations from the fiducial distribution of a model involving the
generalized Pareto distribution. The difference is that the threshold
\(\mu\) of the generalized Pareto
distribution is assumed to be known by gfigpd1
, whereas
gfigpd2
estimates this threshold.
The algorithms are implemented in C++. They are translated from the original R code written by the authors of the reference paper.
The package allows to run some MCMC chains in parallel. In the
examples below, as well as in the examples of the package documentation,
I set nthreads = 2L
because of CRAN restrictions: CRAN does
not allow more than two R processes in parallel and then it would not
accept the package if a higher number of threads were set.
When the threshold \(\mu\) is known, it is assumed that the data are independent realizations of a random variable \(X\) which follows a generalized Pareto distribution \(GP(\mu,\gamma,\sigma)\) conditionally to \(X \geqslant \mu\). On the event \(X < \mu\), no distributional assumption is made.
Then the algorithm performed by the gfigpd1
function
produces some simulations of the fiducial distributions of \(\gamma\), \(\sigma\) and of the \(100\beta\%\)-quantiles of \(X\) at the requested values of \(\beta\). These are MCMC chains.
For example, assume that \(X\) follows the \(GP(\mu,\gamma,\sigma)\) distribution (so \(X < \mu\) never happens).
set.seed(666L)
X <- rgpareto(200L, mu = 10, gamma = 0.3, sigma = 2)
gf1 <- gfigpd1(
X, beta = c(0.99, 0.995, 0.999), threshold = 10, iter = 10000L,
nchains = 4L, nthreads = 2L
)
The output of gfigpd1
is a R object ready for analysis
with the ‘coda’ package. In particular, it has a summary
method.
summary(gf1)
#>
#> Iterations = 2001:61995
#> Thinning interval = 6
#> Number of chains = 4
#> Sample size per chain = 10000
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> gamma 0.3132 0.1017 0.0005086 0.0009532
#> sigma 1.9692 0.2351 0.0011757 0.0022424
#> beta1 30.9276 5.4264 0.0271322 0.0467451
#> beta2 38.0272 9.0772 0.0453861 0.0799586
#> beta3 63.5831 27.2337 0.1361684 0.2454226
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> gamma 0.1347 0.2415 0.3055 0.3769 0.5319
#> sigma 1.5371 1.8051 1.9585 2.1210 2.4602
#> beta1 23.9020 27.2199 29.7925 33.3429 44.5348
#> beta2 27.0268 31.9434 35.9689 41.6726 61.1854
#> beta3 35.3511 46.5221 56.5144 72.0191 134.1137
# compare with the true quantiles:
qgpareto(c(0.99, 0.995, 0.999), mu = 10, gamma = 0.3, sigma = 2)
#> [1] 29.87381 36.00849 56.28855
Every parameter is very well estimated by the median of its fiducial distribution.
We can get the shortest fiducial confidence intervals with the ‘coda’
function HPDinterval
:
When the threshold \(\mu\) is unknown, it is also assumed that the data are independent realizations of a random variable \(X\) which follows a generalized Pareto distribution \(GP(\mu,\gamma,\sigma)\) conditionally to \(X \geqslant \mu\), but there are additional assumptions.
These additional assumptions have no meaningful interpretation but this is not important in order to estimate the quantiles of \(X\): it is possible that the parameters \(\gamma\) and \(\sigma\) cannot be estimated (it is always possible if \(X\) strictly follows the unrealistic assumptions of the model) but \(\mu\) can always be estimated and the fiducial distributions of the quantiles are available.
Let’s assume for example that \(X\) follows a log-normal distribution:
set.seed(666L)
X <- rlnorm(400L, meanlog = 1)
gf2 <- gfigpd2(
X, beta = c(0.99, 0.995, 0.999), iter = 10000L, burnin = 10000L,
nchains = 4L, nthreads = 2L
)
summary(gf2)
#>
#> Iterations = 10001:69995
#> Thinning interval = 6
#> Number of chains = 4
#> Sample size per chain = 10000
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> beta1 25.64 4.025 0.02013 0.1160
#> beta2 32.77 6.471 0.03235 0.1970
#> beta3 55.60 17.167 0.08584 0.5701
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> beta1 19.73 22.75 24.97 27.77 35.44
#> beta2 23.69 28.19 31.59 35.96 48.99
#> beta3 34.07 43.71 51.84 62.69 100.44
# compare with the true quantiles:
qlnorm(c(0.99, 0.995, 0.999), meanlog = 1)
#> [1] 27.83649 35.72423 59.75377
As you can see, the inference on the quantiles is good.