Array operations with the gRbase package

Søren Højsgaard

Introduction

This note describes some operations on arrays in R. These operations have been implemented to facilitate implementation of graphical models and Bayesian networks in R.

Arrays/tables in R

The documentation of R states the following about arrays: An array in R can have one, two or more dimensions. It is simply a vector which is stored with additional attributes giving the dimensions (attribute “dim”) and optionally names for those dimensions (attribute “dimnames”). A two-dimensional array is the same thing as a matrix. One-dimensional arrays often look like vectors, but may be handled differently by some functions.

Cross classified data - contingency tables

Arrays appear for example in connection with cross classified data. The array hec below is an excerpt of the HairEyeColor array in R:

hec <- c(32, 53, 11, 50, 10, 25, 36, 66, 9, 34, 5, 29) 
dim(hec) <- c(2, 3, 2)
dimnames(hec) <- list(Hair = c("Black", "Brown"), 
                      Eye = c("Brown", "Blue", "Hazel"), 
                      Sex = c("Male", "Female"))
hec
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10
#>   Brown    53   50    25
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    36    9     5
#>   Brown    66   34    29

Above, hec is an array because it has a dim attribute. Moreover, \code{hec} also has a dimnames attribute naming the levels of each dimension. Notice that each dimension is given a name.

Printing arrays can take up a lot of space. Alternative views on an array can be obtained with ftable() or by converting the array to a dataframe with as.data.frame.table(). We shall do so in the following.

##flat <- function(x) {ftable(x, row.vars=1)}
flat <- function(x, n=4) {as.data.frame.table(x) |> head(n)}
hec |> flat()
#>    Hair   Eye  Sex Freq
#> 1 Black Brown Male   32
#> 2 Brown Brown Male   53
#> 3 Black  Blue Male   11
#> 4 Brown  Blue Male   50

An array with named dimensions is in this package called a named array. The functionality described below relies heavily on arrays having named dimensions. A check for an object being a named array is provided by is.named.array()

is.named.array(hec)
#> [1] TRUE

Defining arrays

Another way is to use tabNew() from gRbase. This function is flexible wrt the input; for example:

dn <- list(Hair=c("Black", "Brown"), Eye=~Brown:Blue:Hazel, Sex=~Male:Female)
counts <- c(32, 53, 11, 50, 10, 25, 36, 66, 9, 34, 5, 29)
z3 <- tabNew(~Hair:Eye:Sex, levels=dn, value=counts) 
z4 <- tabNew(c("Hair", "Eye", "Sex"), levels=dn, values=counts)

Default dimnames are generated with

z5 <- tabNew(~Hair:Eye:Sex, levels=c(2, 3, 2), values = counts)
dimnames(z5) |> str()
#> List of 3
#>  $ Hair: chr [1:2] "1" "2"
#>  $ Eye : chr [1:3] "1" "2" "3"
#>  $ Sex : chr [1:2] "1" "2"

Using tabNew, arrays can be normalized to sum to one in two ways:

  1. Normalization can be over the first variable for each configuration of all other variables and 2) over all configurations. For example:
z6 <- tabNew(~Hair:Eye:Sex, levels=c(2, 3, 2), values=counts, normalize="first")
z6 |> flat()
#>   Hair Eye Sex  Freq
#> 1    1   1   1 0.376
#> 2    2   1   1 0.624
#> 3    1   2   1 0.180
#> 4    2   2   1 0.820

Operations on arrays

In the following we shall denote the dimnames (or variables) of the array \code{hec} by \(H\), \(E\) and \(S\) and we let \((h,e,s)\) denote a configuration of these variables. The contingency table above shall be denoted by \(T_{HES}\) and we shall refer to the \((h,e,s)\)-entry of \(T_{HES}\) as \(T_{HES}(h,e,s)\).

Normalizing an array

Normalize an array with \rr{tabNormalize()} Entries of an array can be normalized to sum to one in two ways:

  1. Normalization can be over the first variable for each configuration of all other variables and 2) over all configurations. For example:
tabNormalize(z5, "first") |> flat()
#>   Hair Eye Sex  Freq
#> 1    1   1   1 0.376
#> 2    2   1   1 0.624
#> 3    1   2   1 0.180
#> 4    2   2   1 0.820

Subsetting an array – slicing

We can subset arrays (this will also be called ``slicing’’) in different ways. Notice that the result is not necessarily an array. Slicing can be done using standard R code or using \rr{tabSlice}. The virtue of tabSlice() comes from the flexibility when specifying the slice:

The following leads from the original \(2\times 3 \times 2\) array to a \(2 \times 2\) array by cutting away the Sex=Male and Eye=Brown slice of the array:

tabSlice(hec, slice=list(Eye=c("Blue", "Hazel"), Sex="Female"))
#>        Eye
#> Hair    Blue Hazel
#>   Black    9     5
#>   Brown   34    29
## Notice: levels can be written as numerics
## tabSlice(hec, slice=list(Eye=2:3, Sex="Female"))

We may also regard the result above as a \(2 \times 2 \times 1\) array:

tabSlice(hec, slice=list(Eye=c("Blue", "Hazel"), Sex="Female"), drop=FALSE)
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Blue Hazel
#>   Black    9     5
#>   Brown   34    29

If slicing leads to a one dimensional array, the output will by default not be an array but a vector (without a dim attribute). However, the result can be forced to be a 1-dimensional array:

## A vector:
t1 <- tabSlice(hec, slice=list(Hair=1, Sex="Female")); t1
#> Brown  Blue Hazel 
#>    36     9     5
## A 1-dimensional array:
t2 <- tabSlice(hec, slice=list(Hair=1, Sex="Female"), as.array=TRUE); t2 
#> Eye
#> Brown  Blue Hazel 
#>    36     9     5
## A higher dimensional array (in which some dimensions only have one level)
t3 <- tabSlice(hec, slice=list(Hair=1, Sex="Female"), drop=FALSE); t3
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    36    9     5

The difference between the last two forms can be clarified:

t2 |> flat()
#>     Eye Freq
#> 1 Brown   36
#> 2  Blue    9
#> 3 Hazel    5
t3 |> flat()
#>    Hair   Eye    Sex Freq
#> 1 Black Brown Female   36
#> 2 Black  Blue Female    9
#> 3 Black Hazel Female    5

Collapsing and inflating arrays - tabMarg() and tabExpand()

Collapsing: The \(HE\)–marginal array \(T_{HE}\) of \(T_{HES}\) is the array with values \begin{displaymath} T_{HE}(h,e) = \sum_s T_{HES}(h,e,s) \end{displaymath} Inflating: The ``opposite’’ operation is to extend an array. For example, we can extend \(T_{HE}\) to have a third dimension, e.g.\ \code{Sex}. That is `kr\begin{displaymath} \tilde T_{SHE}(s,h,e) = T_{HE}(h,e) \end{displaymath}kr` so `kr(\tilde T_{SHE}(s,h,e))kr` is constant as a function of `kr(s)kr`.

With gRbase we can collapse arrays with:

he <- tabMarg(hec, c("Hair", "Eye"))
he
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    68   20    15
#>   Brown   119   84    54
## Alternatives
tabMarg(hec, ~Hair:Eye)
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    68   20    15
#>   Brown   119   84    54
tabMarg(hec, c(1, 2))
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    68   20    15
#>   Brown   119   84    54
hec %a_% ~Hair:Eye
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    68   20    15
#>   Brown   119   84    54

Notice that collapsing is a projection in the sense that applying the operation again does not change anything (except possibly changing the order of variables):

he1 <- tabMarg(hec, c("Hair", "Eye"))
he2 <- tabMarg(he1, c("Eye", "Hair"))
tabEqual(he1, he2)
#> [1] TRUE

Expand an array by adding additional dimensions with tabExpand():

extra.dim <- list(Sex=c("Male", "Female"))
tabExpand(he, extra.dim) 
#> , , Sex = Male
#> 
#>        Hair
#> Eye     Black Brown
#>   Brown    68   119
#>   Blue     20    84
#>   Hazel    15    54
#> 
#> , , Sex = Female
#> 
#>        Hair
#> Eye     Black Brown
#>   Brown    68   119
#>   Blue     20    84
#>   Hazel    15    54
## Alternatives
he %a^% extra.dim
#> , , Sex = Male
#> 
#>        Hair
#> Eye     Black Brown
#>   Brown    68   119
#>   Blue     20    84
#>   Hazel    15    54
#> 
#> , , Sex = Female
#> 
#>        Hair
#> Eye     Black Brown
#>   Brown    68   119
#>   Blue     20    84
#>   Hazel    15    54

Notice that expanding and collapsing brings us back to where we started:

(he %a^% extra.dim) %a_% c("Hair", "Eye")
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   136   40    30
#>   Brown   238  168   108

Permuting an array – tabPerm()

A reorganization of the table can be made with tabPerm() (similar to aperm()), but tabPerm() allows for a formula and for variable abbreviation:

tabPerm(hec, ~Eye:Sex:Hair) |> flat()
#>     Eye    Sex  Hair Freq
#> 1 Brown   Male Black   32
#> 2  Blue   Male Black   11
#> 3 Hazel   Male Black   10
#> 4 Brown Female Black   36

Alternative forms (the first two also works for \code{aperm}):

tabPerm(hec, c("Eye", "Sex", "Hair"))
#> , , Hair = Black
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   32     36
#>   Blue    11      9
#>   Hazel   10      5
#> 
#> , , Hair = Brown
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   53     66
#>   Blue    50     34
#>   Hazel   25     29
tabPerm(hec, c(2, 3, 1)) 
#> , , Hair = Black
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   32     36
#>   Blue    11      9
#>   Hazel   10      5
#> 
#> , , Hair = Brown
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   53     66
#>   Blue    50     34
#>   Hazel   25     29
tabPerm(hec, ~Ey:Se:Ha) 
#> , , Hair = Black
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   32     36
#>   Blue    11      9
#>   Hazel   10      5
#> 
#> , , Hair = Brown
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   53     66
#>   Blue    50     34
#>   Hazel   25     29
tabPerm(hec, c("Ey", "Se", "Ha"))
#> , , Hair = Black
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   32     36
#>   Blue    11      9
#>   Hazel   10      5
#> 
#> , , Hair = Brown
#> 
#>        Sex
#> Eye     Male Female
#>   Brown   53     66
#>   Blue    50     34
#>   Hazel   25     29

Equality - tabEqual()

Two arrays are defined to be identical 1) if they have the same dimnames and 2) if, possibly after a permutation, all values are identical (up to a small numerical difference):

hec2 <- tabPerm(hec, 3:1)
tabEqual(hec, hec2)
#> [1] TRUE
## Alternative
hec %a==% hec2
#> [1] TRUE

Aligning - tabAlign()

We can align one array according to the ordering of another:

hec2 <- tabPerm(hec, 3:1)
tabAlign(hec2, hec)
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10
#>   Brown    53   50    25
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    36    9     5
#>   Brown    66   34    29
## Alternative:
tabAlign(hec2, dimnames(hec))
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10
#>   Brown    53   50    25
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    36    9     5
#>   Brown    66   34    29

Multiplication, addition etc: \(+\), \(-\), \(*\), \(/\)

The product of two arrays \(T_{HE}\) and \(T_{HS}\) is defined to be the array \(\tilde T_{HES}\) with entries $$ \tilde T_{HES}(h,e,s)= T_{HE}(h,e) + T_{HS}(h,s) $$

The sum, difference and quotient is defined similarly: This is done with tabProd(), tabAdd(), tabDiff() and tabDiv():

hs <- tabMarg(hec, ~Hair:Eye)
tabMult(he, hs)
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black  4624  400   225
#>   Brown 14161 7056  2916

Available operations:

tabAdd(he, hs) 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   136   40    30
#>   Brown   238  168   108
tabSubt(he, hs)
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     0    0     0
#>   Brown     0    0     0
tabMult(he, hs)
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black  4624  400   225
#>   Brown 14161 7056  2916
tabDiv(he, hs) 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     1    1     1
#>   Brown     1    1     1
tabDiv0(he, hs) ## Convention 0/0 = 0
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     1    1     1
#>   Brown     1    1     1

Shortcuts:

## Alternative
he %a+% hs
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   136   40    30
#>   Brown   238  168   108
he %a-% hs
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     0    0     0
#>   Brown     0    0     0
he %a*% hs
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black  4624  400   225
#>   Brown 14161 7056  2916
he %a/% hs
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     1    1     1
#>   Brown     1    1     1
he %a/0% hs ## Convention 0/0 = 0
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black     1    1     1
#>   Brown     1    1     1

Multiplication and addition of (a list of) multiple arrays is accomplished with tabProd() and tabSum() (much like prod() and sum()):

es <- tabMarg(hec, ~Eye:Sex)
tabSum(he, hs, es)  
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   221  101    65
#>   Brown   323  229   143
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   238   83    64
#>   Brown   340  211   142
## tabSum(list(he, hs, es))

Lists of arrays are processed with

tabListAdd(list(he, hs, es))
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   221  101    65
#>   Brown   323  229   143
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black   238   83    64
#>   Brown   340  211   142
tabListMult(list(he, hs, es))
#> , , Sex = Male
#> 
#>        Eye
#> Hair      Brown   Blue  Hazel
#>   Black  393040  24400   7875
#>   Brown 1203685 430416 102060
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair      Brown   Blue Hazel
#>   Black  471648  17200  7650
#>   Brown 1444422 303408 99144

An array as a probability density - tabDist()

If an array consists of non–negative numbers then it may be regarded as an (unnormalized) discrete multivariate density. With this view, the following examples should be self explanatory:

tabDist(hec, marg=~Hair:Eye)
#>        Eye
#> Hair    Brown   Blue  Hazel
#>   Black 0.189 0.0556 0.0417
#>   Brown 0.331 0.2333 0.1500
tabDist(hec, cond=~Sex) 
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown   Blue  Hazel
#>   Black 0.177 0.0608 0.0552
#>   Brown 0.293 0.2762 0.1381
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown   Blue  Hazel
#>   Black 0.201 0.0503 0.0279
#>   Brown 0.369 0.1899 0.1620
tabDist(hec, marg=~Hair, cond=~Sex) 
#>        Sex
#> Hair     Male Female
#>   Black 0.293  0.279
#>   Brown 0.707  0.721

Miscellaneous - tabSliceMult()

Multiply values in a slice by some number and all other values by another number:

tabSliceMult(es, list(Sex="Female"), val=10, comp=0)
#>        Sex
#> Eye     Male Female
#>   Brown    0   1020
#>   Blue     0    430
#>   Hazel    0    340

Examples

A Bayesian network

A classical example of a Bayesian network is the ``sprinkler example’’, see e.g.
(https://en.wikipedia.org/wiki/Bayesian_network): \begin{quote} \em Suppose that there are two events which could cause grass to be wet: either the sprinkler is on or it is raining. Also, suppose that the rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler is usually not turned on). Then the situation can be modeled with a Bayesian network. \end{quote}

We specify conditional probabilities \(p&reg;\), \(p(s|r)\) and \(p(w|s,r)\) as follows (notice that the vertical conditioning bar ($|$) is indicated by the horizontal underscore:

yn <- c("y","n")
lev <- list(rain=yn, sprinkler=yn, wet=yn)
r <- tabNew(~rain, levels=lev, values=c(.2, .8))
s_r <- tabNew(~sprinkler:rain, levels = lev, values = c(.01, .99, .4, .6))
w_sr <- tabNew( ~wet:sprinkler:rain, levels=lev, 
             values=c(.99, .01, .8, .2, .9, .1, 0, 1))
r 
#> rain
#>   y   n 
#> 0.2 0.8
s_r  |> flat()
#>   sprinkler rain Freq
#> 1         y    y 0.01
#> 2         n    y 0.99
#> 3         y    n 0.40
#> 4         n    n 0.60
w_sr |> flat()
#>   wet sprinkler rain Freq
#> 1   y         y    y 0.99
#> 2   n         y    y 0.01
#> 3   y         n    y 0.80
#> 4   n         n    y 0.20

The joint distribution \(p(r,s,w)=p&reg;p(s|r)p(w|s,r)\) can be obtained with tabProd():

joint <- tabProd(r, s_r, w_sr); joint |> flat()
#>   wet sprinkler rain    Freq
#> 1   y         y    y 0.00198
#> 2   n         y    y 0.00002
#> 3   y         n    y 0.15840
#> 4   n         n    y 0.03960

What is the probability that it rains given that the grass is wet? We find \(p(r,w)=\sum_s p(r,s,w)\) and then \(p(r|w)=p(r,w)/p(w)\). Can be done in various ways: with tabDist():

tabDist(joint, marg=~rain, cond=~wet)
#>     wet
#> rain     y      n
#>    y 0.358 0.0718
#>    n 0.642 0.9282
## Alternative:
rw <- tabMarg(joint, ~rain + wet)
tabDiv(rw, tabMarg(rw, ~wet))
## or
rw %a/% (rw %a_% ~wet)
## Alternative:
x <- tabSliceMult(rw, slice=list(wet="y")); x
#>     wet
#> rain     y n
#>    y 0.160 0
#>    n 0.288 0
tabDist(x, marg=~rain)
#> rain
#>     y     n 
#> 0.358 0.642

Iterative Proportional Scaling (IPS)

We consider the \(3\)–way \code{lizard} data from \grbase:

data(lizard, package="gRbase")
lizard |> flat()
#>   diam height species Freq
#> 1  <=4  >4.75   anoli   32
#> 2   >4  >4.75   anoli   11
#> 3  <=4 <=4.75   anoli   86
#> 4   >4 <=4.75   anoli   35

Consider the two factor log–linear model for the \verb’lizard’ data. Under the model the expected counts have the form $$ \log m(d,h,s)= a_1(d,h)+a_2(d,s)+a_3(h,s) $$ If we let \(n(d,h,s)\) denote the observed counts, the likelihood equations are: Find \(m(d,h,s)\) such that \begin{aligned} m(d,h)=n(d,h), \quad m(d,s)=n(d,s), \quad m(h,s)=n(h,s) \end{aligned} where \(m(d,h)=\sum_s m(d,h.s)\) etc. The updates are as follows: For the first term we have

$$ m(d,h,s) \leftarrow m(d,h,s) \frac{n(d,h)}{m(d,h)} % , \mbox{ where } % m(d,h) = \sum_s m(d,h,s) $$

After iterating the updates will not change and we will have equality: $ m(d,h,s) = m(d,h,s) \frac{n(d,h)}{m(d,h)}$ and summing over \(s\) shows that the equation \(m(d,h)=n(d,h)\) is satisfied.

A rudimentary implementation of iterative proportional scaling for log–linear models is straight forward:

myips <- function(indata, glist){
    fit   <- indata
    fit[] <-  1
    ## List of sufficient marginal tables
    md    <- lapply(glist, function(g) tabMarg(indata, g))
    n_iter <- 4
    n_generators <- length(glist)
    for (i in 1:n_iter){
        for (j in 1:n_generators){
            mf  <- tabMarg(fit, glist[[j]])
            # adj <- tabDiv( md[[ j ]], mf)
            # fit <- tabMult( fit, adj )
            ## or
            adj <- md[[ j ]] %a/% mf
            fit <- fit %a*% adj
        }
    }
    pearson <- sum((fit - indata)^2 / fit)
    list(pearson=pearson, fit=fit)
}

glist <- list(c("species", "diam"),c("species", "height"),c("diam", "height"))

fm1 <- myips(lizard, glist)
fm1$pearson
#> [1] 665
fm1$fit |> flat()
#>   species diam height Freq
#> 1   anoli  <=4  >4.75 32.8
#> 2    dist  <=4  >4.75 60.2
#> 3   anoli   >4  >4.75 10.2
#> 4    dist   >4  >4.75 41.8

fm2 <- loglin(lizard, glist, fit=T)
#> 4 iterations: deviation 0.00962
fm2$pearson
#> [1] 0.151
fm2$fit |> flat()
#>   diam height species Freq
#> 1  <=4  >4.75   anoli 32.8
#> 2   >4  >4.75   anoli 10.2
#> 3  <=4 <=4.75   anoli 85.2
#> 4   >4 <=4.75   anoli 35.8

Some low level functions

For e.g.\ a \(2\times 3 \times 2\) array, the entries are such that the first variable varies fastest so the ordering of the cells are \((1,1,1)\), \((2,1,1)\), \((1,2,1)\), \((2,2,1)\), \((1,3,1)\) and so on. To find the value of such a cell, say, \((j,k,l)\) in the array (which is really just a vector), the cell is mapped into an entry of a vector.

For example, cell \((2,3,1)\) (Hair=Brown, Eye=Hazel, Sex=Male) must be mapped to entry \(4\) in

hec
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10
#>   Brown    53   50    25
#> 
#> , , Sex = Female
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    36    9     5
#>   Brown    66   34    29
c(hec)
#>  [1] 32 53 11 50 10 25 36 66  9 34  5 29

For illustration we do:

cell2name <- function(cell, dimnames){
    unlist(lapply(1:length(cell),
                  function(m){
                      dimnames[[m]][cell[m]]
                  }))}
cell2name(c(2,3,1), dimnames(hec))
#> [1] "Brown" "Hazel" "Male"

\subsection{\code{cell2entry()}, \code{entry2cell()} and \code{next_cell()} }

cell2entry and entry2cell

The map from a cell to the corresponding entry is provided by \rr{cell2entry()}. The reverse operation, going from an entry to a cell (which is much less needed) is provided by \rr {entry2cell()}.

cell2entry(c(2,3,1), dim=c(2, 3, 2))
#> [1] 6
entry2cell(6, dim=c(2, 3, 2))
#> [1] 2 3 1

next_cell()

Given a cell, say \(i=(2,3,1)\) in a \(2\times 3\times 2\) array we often want to find the next cell in the table following the convention that the first factor varies fastest, that is \((1,1,2)\). This is provided by next_cell().

next_cell(c(2,3,1), dim=c(2, 3, 2))
#> [1] 1 1 2

next_cell_slice() and slice2entry()

Given that we look at cells for which for which the index in dimension \(2\) is at level \(3\) (that is Eye=Hazel), i.e.\ cells of the form \((j,3,l)\). Given such a cell, what is then the next cell that also satisfies this constraint. This is provided by next_cell_slice().

next_cell_slice(c(1,3,1), slice_marg=2, dim=c( 2, 3, 2 ))
#> [1] 2 3 1
next_cell_slice(c(2,3,1), slice_marg=2, dim=c( 2, 3, 2 ))
#> [1] 1 3 2

Given that in dimension \(2\) we look at level \(3\). We want to find entries for the cells of the form \((j,3,l)\).\footnote{FIXME:slicecell and sliceset should be renamed}

slice2entry(slice_cell=3, slice_marg=2, dim=c( 2, 3, 2 ))
#> [1]  5  6 11 12

To verify that we indeed get the right cells:

r <- slice2entry(slice_cell=3, slice_marg=2, dim=c( 2, 3, 2 ))
lapply(lapply(r, entry2cell, c( 2, 3, 2 )),
       cell2name, dimnames(hec))
#> [[1]]
#> [1] "Black" "Hazel" "Male" 
#> 
#> [[2]]
#> [1] "Brown" "Hazel" "Male" 
#> 
#> [[3]]
#> [1] "Black"  "Hazel"  "Female"
#> 
#> [[4]]
#> [1] "Brown"  "Hazel"  "Female"

fact_grid() - factorial grid

Using the operations above we can obtain the combinations of the factors as a matrix:

head( fact_grid( c(2, 3, 2) ), 6 )
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> [2,]    2    1    1
#> [3,]    1    2    1
#> [4,]    2    2    1
#> [5,]    1    3    1
#> [6,]    2    3    1

A similar dataframe can also be obtained with the standard R function \code{expand.grid} (but \code{factGrid} is faster)

head( expand.grid(list(1:2, 1:3, 1:2)), 6 )
#>   Var1 Var2 Var3
#> 1    1    1    1
#> 2    2    1    1
#> 3    1    2    1
#> 4    2    2    1
#> 5    1    3    1
#> 6    2    3    1

\appendix

More about slicing

Slicing using standard R code can be done as follows:

hec[, 2:3, ]  |> flat()  ## A 2 x 2 x 2 array
#>    Hair   Eye  Sex Freq
#> 1 Black  Blue Male   11
#> 2 Brown  Blue Male   50
#> 3 Black Hazel Male   10
#> 4 Brown Hazel Male   25
hec[1, , 1]             ## A vector
#> Brown  Blue Hazel 
#>    32    11    10
hec[1, , 1, drop=FALSE] ## A 1 x 3 x 1 array
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10

Programmatically we can do the above as

do.call("[", c(list(hec), list(TRUE, 2:3, TRUE)))  |> flat()
#>    Hair   Eye  Sex Freq
#> 1 Black  Blue Male   11
#> 2 Brown  Blue Male   50
#> 3 Black Hazel Male   10
#> 4 Brown Hazel Male   25
do.call("[", c(list(hec), list(1, TRUE, 1))) 
#> Brown  Blue Hazel 
#>    32    11    10
do.call("[", c(list(hec), list(1, TRUE, 1), drop=FALSE)) 
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10

\grbase\ provides two alterntives for each of these three cases above:

tabSlicePrim(hec, slice=list(TRUE, 2:3, TRUE))  |> flat()
#>    Hair   Eye  Sex Freq
#> 1 Black  Blue Male   11
#> 2 Brown  Blue Male   50
#> 3 Black Hazel Male   10
#> 4 Brown Hazel Male   25
tabSlice(hec, slice=list(c(2, 3)), margin=2) |> flat()
#>    Hair   Eye  Sex Freq
#> 1 Black  Blue Male   11
#> 2 Brown  Blue Male   50
#> 3 Black Hazel Male   10
#> 4 Brown Hazel Male   25

tabSlicePrim(hec, slice=list(1, TRUE, 1))  
#> Brown  Blue Hazel 
#>    32    11    10
tabSlice(hec, slice=list(1, 1), margin=c(1, 3)) 
#> Brown  Blue Hazel 
#>    32    11    10

tabSlicePrim(hec, slice=list(1, TRUE, 1), drop=FALSE)  
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10
tabSlice(hec, slice=list(1, 1), margin=c(1, 3), drop=FALSE) 
#> , , Sex = Male
#> 
#>        Eye
#> Hair    Brown Blue Hazel
#>   Black    32   11    10