Introduction to SNSeg and Examples

SNSeg supports change-points estimation for both univariate and multivariate time series (including high-dimensional time series with dimension greater than 10.) using Self-Normalization (SN) based framework. Please read Zhao, Jiang and Shao (2022) <doi.org/10.1111/rssb.12552> for details of the SN-based algorithms.

The package contain three functions for change-points estimation:

All functions contain two important input arguments: grid_size_scale and grid_size.

Users can set their own grid_size or leave it as NULL in the input arguments. If grid_size is NULL, these functions will compute the SN test statistic using the value of grid_size_scale. If grid_size is set by users, the functions will first calculate grid_size_scale by diving the length of time, and then compute the SN test statistic.

For the other input arguments: * ts: Users should enter a time series for this argument. For SNSeg_Uni, the dimension of ts should be exactly one in most cases (or two when paras_to_test = 'bivcor'); for SNSeg_Multi, the dimension must be at least two but no more than ten; for SNSeg_HD, the dimension must be greater than ten. * paras_to_test: This argument in functions SNSeg_Uni and SNSeg_Multi allows users to enter the parameter(s) they would like to test the change in. * confidence: Users should choose a confidence level among 0.9, 0.95, 0.99, 0.995 and 0.995. A smaller confidence level is easier to reject the null hypothesis and detect a change-point.

The package also offers an option to plot the time series (this only works for the univariate time series cases!) Users need to set plot_SN = TRUE to visualize the time series, and est_cp_loc = TRUE to add the estimated change-point locations in the plot.

SN Test Statistic Plot: max_SNsweep

To visualize the computed SN-based test statistic at each time point, users can apply the function max_SNsweep by plugging in the output object from one of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD. The options est_cp_loc = TRUE and critical_loc = TRUE are provided to draw the estimated change-point locations and the critical value threshold inside the test statistic plot.

max_SNsweep also returns the SN-based test statistic for each time point. A large number of test statistics in the output can be messy for users who only seek to generate a plot. To hide the test statistics output, users can create an arbitrary variable name and set it to the max_SNsweep function, e.g., SN_stat <- max_SNsweep(...).

Parameter Estimates of Each Segment Separated by the Detected Change-Points: SNSeg_estimate

The function SNSeg_estimate allows users to compute the parameter estimates (e.g., mean, variance, acf, quantile, etc.) of each of the segments separated by the estimated change-points. To use this function, users should use the output of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD as the input of SNSeg_estimate.

S3 methods: summary, print and plot

The typical S3 methods summary, print and plot are available to SNSeg_Uni, SNSeg_Multi and SNSeg_HD objects. The summary method displays the parameter to be tested, the estimated change-point amount and locations, the grid_size, confidence level as well as the critical value of the SN-based test. The print method shows the change-point locations. The plot method plots the time series, and similar to the argument plot_SN = TRUE, the plot method allows users to generate time series segmentation plot(s) with the estimated change-point locations. It also provides ts_index option to allow users to plot any individual time series they want. Users can apply their preferred color of the change-point(s) within the plot(s) by setting cpts.col to any color.

We then provide the examples of the functions SNSeg_Uni, SNSeg_Multi and SNSeg_HD.

Examples of SNSeg_Uni:

The function SNSeg_Uni detect change-points for a univariate time series based on the change in a single or parameters.

We provide examples for different cases:

# Please run the following function before running examples:
mix_GauGPD <- function(u,p,trunc_r,gpd_scale,gpd_shape){
  indicator <- u<p
  rv <- rep(0, length(u))
  rv[indicator>0] <- qtruncnorm(u[indicator>0]/p,a=-Inf,b=trunc_r)
  rv[indicator<=0] <- qgpd((u[indicator<=0]-p)/(1-p), loc=trunc_r, scale=gpd_scale,shape=gpd_shape)
  return(rv)
}

Test in a single parameter

Segmentation for Mean

set.seed(7)
n <- 2000
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1   
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined & generate time series segmentation plot
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = 116, 
                    plot_SN = TRUE, est_cp_loc = TRUE)

# Estimated change-point locations
result$est_cp
#> [1] 1018 1487
# Parameter estimates (mean) of each segment
SNSeg_estimate(result)
#> $mean
#> [1] 0.39444849 0.02107613 0.38362925
#> 
#> attr(,"class")
#> [1] "SNSeg_estimate"
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)

We then show how to use the S3 methods summary, print and plot.

summary(result)
#> There are 2 change-point(s) detected at 90th confidence level based on the change in the single mean parameter.
#> 
#> The critical value of SN-based test is 135.39283594
#> 
#> The detected change-point location(s) are 1018,1487 with a grid_size of 116
print(result)
#> The detected change-point location(s) are 1018,1487
plot(result, cpts.col = 'red')

We can see both the plot_SN = TRUE option and the plot.SN function generates the same plot. Users can select any choice based on their preferences. The class of the argument result is SNSeg_Uni.

Segmentation for Variance

set.seed(7)
ts <- MAR_Variance(2, "V1")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "variance", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (variance) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for Autocorrelation

set.seed(7)
ts <- MAR_Variance(2, "A3")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "acf", confidence = 0.9,
          grid_size_scale = 0.05, grid_size = 92, plot_SN = FALSE,
          est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (acf) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for bivariate correlation

library(mvtnorm)
set.seed(7)
n <- 1000
sigma_cross <- list(4*matrix(c(1,0.8,0.8,1), nrow=2),
                    matrix(c(1,0.2,0.2,1), nrow=2),
                    matrix(c(1,0.8,0.8,1), nrow=2))
cp_sets <- round(c(0,n/3,2*n/3,n))
noCP <- length(cp_sets)-2
rho_sets <- rep(0.5, noCP+1)
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets, sigma_cross)
ts <- ts[1][[1]]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "bivcor", confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = 77, 
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (bivariate correlation) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Segmentation for quantile

Please download the packages truncnorm and evd before running the following code.

library(truncnorm)
library(evd)
set.seed(7)
n <- 1000
cp_sets <- c(0,n/2,n)
noCP <- length(cp_sets)-2
reptime <- 2
rho <- 0.2
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, reptime, rho)*sqrt(1-rho^2)
trunc_r <- 0
p <- pnorm(trunc_r)
gpd_scale <- 2
gpd_shape <- 0.125
for(ts_index in 1:reptime){
  ts[(cp_sets[2]+1):n, ts_index] <- mix_GauGPD(pnorm(ts[(cp_sets[2]+1):n, ts_index]),
p,trunc_r,gpd_scale,gpd_shape)
}
ts <- ts[,2]
# grid_size undefined
# test in 90% quantile
result <- SNSeg_Uni(ts, paras_to_test = 0.9, confidence = 0.9,
                    grid_size_scale = 0.066, grid_size = NULL,
                    plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (90th quantile) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Test in a general functional

To perform the SN-based test using any general functional, users should define their own function as the input of paras_to_test. The input of this function should consist of the followings:

The user-defined function should return a numeric value as the output.

set.seed(7)
n <- 500
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# Define a general functional for the input 'paras_to_test'
paras_to_test = function(ts){
  mean(ts)
}
result.SNCP.general <- SNSeg_Uni(ts, paras_to_test = paras_to_test, 
                                 confidence = 0.9, grid_size_scale = 0.05, 
                                 grid_size = NULL, plot_SN = FALSE, 
                                 est_cp_loc = TRUE)
# Estimated change-point locations
result.SNCP.general$est_cp
# Parameter estimates (general functional) of each segment
SNSeg_estimate(result.SNCP.general)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result.SNCP.general, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Test in multiple parameters

set.seed(7)
n <- 1000
cp_sets <- c(0,333,667,1000)
no_seg <- length(cp_sets)-1
rho <- 0
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, 2, rho)*sqrt(1-rho^2)
no_seg <- length(cp_sets)-1
sd_shift <- c(1,1.6,1)
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,] <- ts[tau1:tau2,]*sd_shift[index]
}
d <- 2
ts <- ts[,2]

# Test in 90th and 95th quantile with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 0.95), confidence = 0.9,
                    grid_size_scale = 0.05, grid_size = NULL, 
                    plot_SN = FALSE, est_cp_loc = FALSE)

# Test in 90th quantile and the variance with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 'variance'),
                    confidence = 0.95, grid_size_scale = 0.078,
                    grid_size = NULL, plot_SN = FALSE, 
                    est_cp_loc = FALSE)

# Test in 90th quantile, variance and acf with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9,'variance', "acf"),
                    confidence = 0.9, grid_size_scale = 0.064,
                    grid_size = NULL, plot_SN = TRUE, 
                    est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Test in 60th quantile, mean, variance and acf with grid_size defined
result.last <- SNSeg_Uni(ts, paras_to_test = c(0.6, 'mean', "variance",   
                    "acf"), confidence = 0.9, grid_size_scale = 0.05,
                    grid_size = 83, plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result.last$est_cp
# Parameter estimates (of the last example) of each segment
SNSeg_estimate(result.last)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result.last, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')

Examples: SNSeg_Multi

The function SNSeg_Multi detects change-points for multivariate time series based on the change in multivariate means or covariances. Users can set paras_to_test to mean or covariance for change-points estimation.

For examples of multivariate means and covariances, please check the following commands:

# Please run this function before running examples:
exchange_cor_matrix <- function(d, rho){
  tmp <- matrix(rho, d, d)
  diag(tmp) <- 1
  return(tmp)
}

Segmentation for Multivariate Mean

library(mvtnorm)
set.seed(10)
d <- 5
n <- 1000
cp_sets <- round(n*c(0,cumsum(c(0.075,0.3,0.05,0.1,0.05)),1))
mean_shift <- c(-3,0,3,0,-3,0)/sqrt(d)
mean_shift <- sign(mean_shift)*ceiling(abs(mean_shift)*10)/10
rho_sets <- 0.5
sigma_cross <- list(exchange_cor_matrix(d,0))
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets=c(0,n), sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:2){
  for(index in 1:no_seg){ # Mean shift
    tau1 <- cp_sets[index]+1
    tau2 <- cp_sets[index+1]
    ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] + mean_shift[index]
    }
}
ts <- ts[1][[1]]

# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.95,
                      grid_size_scale = 0.079, grid_size = NULL,
                      plot_SN = FALSE, est_cp_loc = TRUE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.99,
                      grid_size_scale = 0.05, grid_size = 65,
                      plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (multivariate mean) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(2,3))
plot(result, cpts.col = 'red')

Segmentation for Multivariate Covariance

library(mvtnorm)
set.seed(10)
reptime <- 2
d <- 4
n <- 1000
sigma_cross <- list(exchange_cor_matrix(d,0.2),
                    2*exchange_cor_matrix(d,0.5),
                    4*exchange_cor_matrix(d,0.5))
rho_sets <- c(0.3,0.3,0.3)
mean_shift <- c(0,0,0) # with mean change
cp_sets <- round(c(0,n/3,2*n/3,n))
ts <- MAR_MTS_Covariance(n, reptime, rho_sets, cp_sets, sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:reptime){
  for(index in 1:no_seg){ # Mean shift
    tau1 <- cp_sets[index]+1
    tau2 <- cp_sets[index+1]
    ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] +
      mean_shift[index]
    }
}
ts <- ts[[1]]

# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "covariance", 
                      confidence = 0.9, grid_size_scale = 0.05, 
                      grid_size = NULL, plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "covariance", 
                      confidence = 0.9, grid_size_scale = 0.05, 
                      grid_size = 81, plot_SN = FALSE,
                      est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (covariance estimate) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(1,1))
plot(result, cpts.col = 'red')

Examples: SNSeg_HD

The function SNSeg_HD performs change-points estimation for high-dimensional time series (dimension greater than 10) using the change in high-dimensional means. For usage examples of SNSeg_HD, we provide the followig code:

n <- 600
p <- 100
nocp <- 5
cp_sets <- round(seq(0,nocp+1,1)/(nocp+1)*n)
num_entry <- 5
kappa <- sqrt(4/5) # Wang et al(2020)
mean_shift <- rep(c(0,kappa),100)[1:(length(cp_sets)-1)]
set.seed(1)
ts <- matrix(rnorm(n*p,0,1),n,p)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
  tau1 <- cp_sets[index]+1
  tau2 <- cp_sets[index+1]
  ts[tau1:tau2,1:num_entry] <- ts[tau1:tau2,1:num_entry] +
    mean_shift[index] # sparse change
}
# SN segmentation plot 
# grid_size undefined (plot the first 3 time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
                   grid_size = NULL, plot_SN = FALSE, est_cp_loc = TRUE,
                   ts_index = c(1:3))
# grid_size defined (plot the 1st, 3rd and 5th time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale  = 0.05,
                   grid_size = 52, plot_SN = FALSE, est_cp_loc = TRUE,
                   ts_index = c(1,3,5))
# Estimated change-point locations
result$est_cp
# Parameter estimates (high-dimensional means) of each segment
summary.stat <- SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
                       critical_loc = TRUE)
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red', ts_index = c(1:3))

We note that only the selected time series were plotted by setting values for ts_index.