PairViz is an R package which provides orderings of objects for visualisation purposes. The problem of constructing an ordering can be regarded as traversing possibly weighted graphs. In this vignette, we focus on graphs and their traversal. Applications to data visualisation are given in accompanying vignettes.
PairViz is installed from CRAN in the usual way.
install.packages("PairViz")
PairViz uses the graph data structure provided by the graph package which is located on the Bioconductor repository rather than CRAN. To install this, use
if (!requireNamespace("graph", quietly = TRUE)){
install.packages("BiocManager")
::install("graph")
BiocManager }
In this vignette, we will also show plots of the graph structure, which requires package igraph from CRAN. The following utility function will be helpful:
requireNamespace("igraph")
if (!requireNamespace("igraph", quietly = TRUE)){
install.packages("igraph")
}
<- function(g,weights=FALSE,layout=igraph::layout_in_circle,
igplot vertex.size=60, vertex.color="lightblue",...){
<- igraph::graph_from_graphnel(as(g, "graphNEL"))
g <- par(mar=c(1,1,1,1))
op if (weights){
<- round(igraph::get.edge.attribute(g,"weight"),3)
ew ::plot.igraph(g,layout=layout,edge.label=ew,vertex.size=vertex.size,vertex.color=vertex.color,...)
igraph
}else
::plot.igraph(g,layout=layout,vertex.size=vertex.size,vertex.color=vertex.color,...)
igraphpar(op)
}
Alternatively, graphs may be plotted using package Rgraphviz which is installed from Bioconductor:
if (!requireNamespace("Rgraphviz", quietly = TRUE)){
install.packages("BiocManager")
::install("Rgraphviz")
BiocManager
}
# For a graph g use
plot(g)
A graph is a mathematical structure made up of nodes, and edges joining those nodes. We consider only graphs where edges are undirected. In some cases, edges are assigned weights, reflecting some measure of importance.
A complete graph is a graph where there is an edge connecting every pair of nodes. Here are two such complete graphs, k4 with four nodes and k5 with five nodes.
suppressPackageStartupMessages(library(PairViz))
<- mk_complete_graph(4)
k4 <- mk_complete_graph(5)
k5 igplot(k4)
igplot(k5)
We will first construct graph traversals that visit every edge of the
graph at least once. The PairViz function eulerian()
gives
such traversals:
eulerian(k5)
#> [1] "1" "2" "3" "1" "4" "2" "5" "3" "4" "5" "1"
Mathematically speaking, an Eulerian path of a graph is a path that
visits every edge of a graph exactly once. An Eulerian tour of a graph
is an Eulerian path that is closed, i.e. ends up at the starting node.
For the graph k5
, one such Eulerian tour goes from 1 ->2
-> 3 -> 1 and so on until it ends back at node 1, as given by
eulerian(k5)
. It is well-known that a graph has an Eulerian
tour if every node has an even number of edges. This condition holds for
a complete graph with an odd number of nodes, such as
k5
.
Similarly, a graph has an Eulerian path if all but two nodes have an
even number of edges. If we remove one edge, say that connecting nodes 1
and 2 from k5
, these two nodes have an odd number of edges
while other nodes are even.
<- removeEdge("1", "2", k5)
k5a igplot(k5a)
eulerian(k5a)
#> [1] "1" "3" "2" "4" "1" "5" "3" "4" "5" "2"
The Eulerian for k5a
starts at one of the odd nodes (here
“1”) and visits all edges ending at “2”, the other odd node.
Most graphs are not Eulerian, that is they do not meet the conditions
for an Eulerian path to exist. The graph k4
for instance,
has four nodes and all have three edges. In this case, any path visiting
all edges must visit some edges more than once. This is what
eulerian(k4)
does:
eulerian(k4)
#> [1] "1" "3" "2" "4" "3" "4" "1" "2"
If you look closely you will see the edge connecting nodes “3” and
“4” is visited twice. An other way to think of it is that extra edges
need to be added to a graph so that all but two nodes become even. For
k4
only one extra edge is needed, and in the example above
that extra edge connects nodes “3” and “4”.
In summary, for any graph, the function eulerian()
returns a path visiting all edges at least once. If the graph is
Eulerian, the returned path visits each edge exactly once. Otherwise,
some edges are visited twice. For more details of this algorithm, see
Hurley and Oldford (2010, 2011).
First construct a graph with weighted edges.
<- LETTERS[1:5]
n <- new("graphNEL",nodes=n)
g <- n[c(1,1,2,2,2,4)]
efrom <- n[c(2:3,3:5,5)]
eto <- c(8,9,5:7,1)
ew <- addEdge(efrom, eto, g, ew) g
To plot the graph use
igplot(g, weights=TRUE,edge.label.color="black")
eulerian(g)
#> [1] "E" "D" "B" "C" "A" "B" "E"
As the graph g
is weighted, the eulerian algorithm
starts at the edge with the lowest weight, and then proceeds to visit
every edge, preferring lower weight edges whenever there is a choice of
edge to be visited next. If the eulerian()
function is
called with the option weighted=FALSE
, then weights are
ignored.
eulerian(g, weighted=FALSE)
#> [1] "A" "B" "D" "E" "B" "C" "A"
In this case, the Eulerian path visits edges going next to the first
available node in nodes(g)
.
For complete graphs, PairViz includes some alternative constructions of Eulerian (or nearly Eulerian) paths.
The function eseq()
uses a recursive algorithm, and
forms a path on 1,..,n by appending extra edges on to the tour on
1,…,(n-2). For more details of the eseq
algorithm, see
Hurley and Oldford (2011).
The function eseqa()
gives a different sequence. For
odd n, the tour starts at 1, then takes steps of size 1,2,…,m
repeatedly, where m is (n-1)/2. For even n, the path constructed is
formed as eseqa(n+1)
, followed by dropping node n+1. (Note
that any permulation of the step sizes 1,2,…,m would do just as
well).
Two other functions are
kntour_drop()
takes an Euler tour on the complete
graph with nodes 1,2,..n as input for odd n, and removes all occurences
of n from the sequence. The result is a nearly Eulerian path on the
complete graph with n-1 nodes.
kntour_add()
takes an Euler tour on the complete
graph with nodes 1,2,..n as input for odd n, and appends a path joining
node n+1 to each of nodes 1,2,…,n . The result is a nearly Eulerian path
on the complete graph with n-1 nodes.
A Hamiltonian path of a graph is a path that visits every node of a graph exactly once. For a general graph, determining whether such a path exists is difficult, and in fact is an NP-hard problem. For a complete graph of n nodes, the path 1,2,…,n is a Hamiltonian path.
Consider a complete graph with an odd number of nodes such as
k5
.
<- rep(1, length(edgeNames(k5)))
ew <- c(1,5,9,10)
s1 <- 5
ew[s1]<- rep("grey40", length(ew))
ec <- "cyan"
ec[s1]igplot(k5, edge.width=ew, edge.color=ec)
<- 1
ew[]<- c(2,6:8)
s2 <- 5
ew[s2]<- "grey40"
ec[] <- "magenta"
ec[s2]igplot(k5, edge.width=ew, edge.color=ec)
The blue hamiltonian path visits the nodes in order 1,2,3,5,4. The
pink hamiltonian path visits the nodes in order 1,3,4,2,5. For a
complete graph with an odd number of nodes such as k5
, it
is possible to construct an Eulerian tour by concatenating together a
number of Hamiltonians. This construction is known as a Hamiltonian
decomposition.
<- 5
ew[]<- "cyan"
ec[s1]<- 3:4
s3 <- "rosybrown1"
ec[s3]igplot(k5, edge.width=ew, edge.color=ec)
igplot(k5, edge.width=ew, edge.color=ec)
In the figure above, the light red edge from 4 to 1 connects the two hamiltonians, and the remaining light red edge from 5 to 1 connects the last node of the pink path to the first node from the blue path to complete the tour.
In PairViz, the two hamiltonian paths above are constructed by
hpaths(5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 5 4
#> [2,] 1 3 4 2 5
If you want the eulerian composed of the two hamiltonians, use
hpaths(5, matrix=FALSE)
#> [1] 1 2 3 5 4 1 3 4 2 5 1
The above constructions produce paths on complete graphs with an even number of nodes which are nearly eulerian. Look at
hpaths(6)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 2 6 3 5 4
#> [2,] 2 3 1 4 6 5
#> [3,] 3 4 2 5 1 6
hpaths(6, matrix=FALSE)
#> [1] 1 2 6 3 5 4 2 3 1 4 6 5 3 4 2 5 1 6
Each row of the result of hpaths produces a hamiltonian on k6 (a complete graph with 6 nodes). When the hamiltonians are concatenated, all edges on k6 are visited. With a close inspection, we see the edges connecting the rows (here 4-2 and 5-3) are visted twice. There is no need to connect the first node of row 1 to the last node of row 3, as this edge has already been visited.
Other isomorphic decompositions are obtained by supplying the first Hamiltonian as an argument to hpaths. Then all node labels are rearranged accordingly, as in the code snippet below.
hpaths(1:5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 4 5
#> [2,] 1 3 5 2 4
Consider the complete graph k7
whose edge weights are
given by
set.seed(123)
<- mk_complete_graph(7)
k7 <- sample(numEdges(k7),numEdges(k7)) # a vector of edgeweights ew
The easiest way to assign weights to edges is to use a distance matrix:
<- matrix(0,7,7)
d7 lower.tri(d7)] <- ew
d7[upper.tri(d7)]<- t(d7)[upper.tri(d7)]
d7[
d7 #> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 0 15 19 14 3 10 2
#> [2,] 15 0 6 11 5 4 21
#> [3,] 19 6 0 9 13 18 8
#> [4,] 14 11 9 0 1 12 20
#> [5,] 3 5 13 1 0 17 7
#> [6,] 10 4 18 12 17 0 16
#> [7,] 2 21 8 20 7 16 0
# or using the shortcut function edge2dist from PairViz
#d7 <- as.matrix(edge2dist(ew))
Now d7
is a symmetric matrix of distances, where the
values in ew
specify the distances in order 1-2, 1-3, ….,
1-7, 2-7,…, 3-7, …,6-7.
<- mk_complete_graph(d7)
k7 igplot(k7, weights=TRUE,edge.label.color="black", vertex.label.cex=2,vertex.size=30)
# Unfortunately, plot.igraph does not show graph edge weights automatically, you have to
# input them as above. You might want to check that the igraph
# matches that of ew.
::E(igraph::graph_from_graphnel(k7))
igraph#> + 21/21 edges from f03f0d3 (vertex names):
#> [1] 1--2 1--3 1--4 1--5 1--6 1--7 2--3 2--4 2--5 2--6 2--7 3--4 3--5 3--6 3--7
#> [16] 4--5 4--6 4--7 5--6 5--7 6--7
Like hpaths()
, the function
weighted_hpaths()
finds Hamiltonians which when
concatenated visit all edges at least once. However
weighted_hpaths()
uses a greedy algorithm to make the first
Hamiltonian low weight with weights increasing for successive
Hamiltonians.
weighted_hpaths(d7)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 6 2 5 4 1 7 3
#> [2,] 6 5 1 2 3 4 7
#> [3,] 6 1 3 5 7 2 4
# this version returns the eulerian
weighted_hpaths(d7, matrix=FALSE)
#> [1] 4 5 7 1 6 2 3 4 1 5 2 7 3 6 4 2 1 3 5 6 7 4
We can easily calculate the edge weights and their sum for the first Hamiltonian.
<- weighted_hpaths(d7, matrix=FALSE)
o <- o[1:8] # include the 8th to form the tour
o1 <- dist2edge(d7)
d7e # d7e is a vector giving edge weights in order (1,1)... (1,7), (2,3),.. (2,7) etc
<- path_weights(d7e, o1) # the edge weights for o1
h1weights # the same as
cbind(head(o1,-1), o1[-1])]
d7[#> [1] 1 3 2 8 6 4 12
h1weights#> [1] 1 3 2 8 6 4 12
sum(h1weights)
#> [1] 36
The second and third Hamiltonians have edges whose sums are
<- o[8:15]
o2 sum(path_weights(d7e, o2))
#> [1] 84
<- o[15:22]
o3 sum(path_weights(d7e, o3))
#> [1] 111
The weighted_hpaths()
function uses
order_tsp()
, a TSP solver (by default nearest insertion) to
come up with the first Hamiltonian. This algorithm uses heuristics, and
may not find the overall winner. As the graph has just 7 nodes, it is
possible to use brute force evaluation to find the shortest hamiltonian
tour.
order_best(d7,cycle=TRUE)
#> [1] 4 5 1 7 3 2 6
order_tsp(d7,cycle=TRUE)
#> [1] 4 5 7 1 6 2 3
In this example, we can confirm that order_tsp
finds the
shortest Hamiltonian tour. As order_best
is computationally
highly demanding, if you try it for graphs with more that
maxexact
(defaults to 9) nodes, only a sample of
permutations is evaluated.
Consider the graph k7
. At his stage, we have four
algorithms for forming Eulerians which do not use weights. These are
<- eseq(7)
e1 <- eseqa(7)
e2 <- eulerian(7) # same path as eulerian(k7, weighted=FALSE)
e3 <- hpaths(7, matrix=FALSE) h1
The plot of eseq(7)
is coloured to show its recursive
construction. eseq(3)
is in tan, the additional edges added
for eseq(5)
are in blue, while those for
eseq(7)
are in pink.
The plot of eseqa(7)
shows its construction; edges are a
distance 1 (in tan), 2 (in blue) and 3 (in pink) apart repeatedly,
considering the last node 7 and the first to be a distance 1 apart.
In eulerian(7)
, edges connecting to node 1 are in tan,
edges from nodes 2 and 3 are in blue excepting those connecting node 1,
while remaining edges involving nodes 4,5, 6 and 7 only are in pink.
This display illustrates that the eulerian algorithm always moves to the
lowest available node.
In the plot of hpaths(7)
the three concatenated
hamiltonians are coloured blue, pink and tan.
From these displays, if you want an eulerian visiting the low
numbered nodes first, use eseq()
or
eulerian()
. If you want an Eulerian where visits to a node
are spread out, pick eseqa()
or hpaths()
. And
of course, if the Hamiltonian property is important, then
hpaths()
is the best choice.
We have two different Eulerians for an edge-weighted k7
.
These are
<- eulerian(d7) # same path as eulerian(k7)
e4 <- weighted_hpaths(d7, matrix=FALSE) h2
which are plotted below. The colouring scheme used here is the same
as that for the plots of eulerian(7)
and
hpaths(7)
.
Recall that the goal of our traversals of edge-weights graphs was to visit low-weight edges early in the path. To check this, we can find the edges weights for the paths using
<- dist2edge(d7)
d7e path_weights(d7e, e4) # the edge weights for e4
#> [1] 1 3 2 7 5 4 10 14 9 6 11 12 16 8 13 17 18 19 15 21 20
path_weights(d7e, h2) # the edge weights for h2
#> [1] 1 7 2 10 4 6 9 14 3 5 21 8 18 12 11 15 19 13 17 16 20
Plotting these, we see that the edge weights for both e4
and h2
increase as the index increases. The path based on
Hamiltonians is less successful at ordering the edge weights as it must
visit a sequence of Hamiltonians. The edge weights for the first
Hamiltonian are in blue, for the second in magenta and the third in
tan.
C.B. Hurley and R.W. Oldford, Pairwise display of high dimensional information via Eulerian tours and Hamiltonian decompositions. Journal of Computational and Graphical Statistics. 19(10), pp. 861–886, 2010.
C.B. Hurley and R.W. Oldford, Eulerian tour algorithms for data visualization and the PairViz package. Computational Statistics, 26(4), pp 613–633, 2011.