For single-cell data, cell-level network analysis can be performed based on joint similarity in alpha chain sequence and beta chain sequence.
We simulate some toy data to demonstrate the usage.
set.seed(42)
library(NAIR)
dat <- simulateToyData(chains = 2)
head(dat)
#> AlphaSeq BetaSeq Count UMIs SampleID
#> 1 TTGAGGAAATTCG TTGAGGAAATTCGG 3095 4 Sample1
#> 2 GGAGATGAATCGG GGAGATGAATCGG 3057 6 Sample1
#> 3 GTCGGGTAATTGG GTCGGGTAATTGGG 3575 8 Sample1
#> 4 GCCGGGTAATTCG GCCGGGTAATTCGG 3994 7 Sample1
#> 5 GAAAGAGAATTCG GAAAGAGAATTCGG 3670 3 Sample1
#> 6 AGGTGGGAATTCG AGGTGGGAATTCG 4076 5 Sample1
The input data is assumed to have the following format:
Dual-chain network analysis can be performed using
buildRepSeqNetwork()
(or
generateNetworkObjects()
) by supplying a length-2 vector to
the seq_col
parameter:
# Build network based on joint dual-chain similarity
network <- buildNet(dat,
seq_col = c("AlphaSeq", "BetaSeq"),
count_col = "UMIs",
node_stats = TRUE,
stats_to_include = "all",
cluster_stats = TRUE,
color_nodes_by = "SampleID",
size_nodes_by = "UMIs",
node_size_limits = c(0.5, 3)
)
We print the network graph plot with labels added for the largest two clusters:
The list returned buildRepSeqNetwork()
the following
items:
names(network)
#> [1] "details" "igraph" "adjacency_matrix" "adj_mat_a"
#> [5] "adj_mat_b" "node_data" "cluster_data" "plots"
Notice that the list contains three adjacency matrices:
adjacency_matrix
corresponds to the network based on joint
similarity in both chain sequences, while adj_mat_a
corresponds to the network based only on similarity in the alpha-chain
sequence (and similarly for adj_mat_b
).
The cluster-level data contains sequence-based cluster statistics for each of the alpha and beta chain sequences:
head(network$cluster_data)
#> cluster_id node_count mean_A_seq_length mean_B_seq_length mean_degree
#> 1 1 15 12.13 12.87 2.60
#> 2 2 13 13.00 13.08 4.00
#> 3 3 16 13.00 13.94 5.81
#> 4 4 10 12.00 12.00 2.90
#> 5 5 3 13.00 14.00 1.67
#> 6 6 3 13.00 14.00 2.00
#> max_degree A_seq_w_max_degree B_seq_w_max_degree agg_count max_count
#> 1 7 AAAAAAAAATTC AAAAAAAAATTCG 42 6
#> 2 11 GGGGGGGAATTGG GGGGGGGAATTGG 28 6
#> 3 12 GGGGGGGAATTGG GGGGGGGAATTGGG 49 6
#> 4 6 AAAAAGAAATTG AAAAAGAAATTG 39 7
#> 5 2 AGGGGAGAATTGG AGGGGAGAATTGGG 10 5
#> 6 2 AAAAAAGAATTGC AAAAAAGAATTGCG 4 2
#> A_seq_w_max_count B_seq_w_max_count diameter_length global_transitivity
#> 1 AAAAAAAAATTC AAAAAAAAATTC 6 0.2884615
#> 2 GGGGTGGAATTGG GGGGTGGAATTGG 7 0.3802817
#> 3 GGGGAGAAATTGG GGGGAGAAATTGGG 6 0.6328125
#> 4 AAAGAAAAATTG AAAGAAAAATTG 6 0.3750000
#> 5 AGGGGAGAATTGG AGGGGAGAATTGGG 3 0.0000000
#> 6 AGAAAAGAATTGC AGAAAAGAATTGCG 2 1.0000000
#> assortativity edge_density degree_centrality_index closeness_centrality_index
#> 1 -0.16503588 0.1809524 0.3190476 0.4497821
#> 2 -0.15180055 0.2692308 0.3141026 0.4357891
#> 3 -0.08424855 0.3416667 0.3250000 0.4650078
#> 4 -0.33425414 0.3111111 0.3555556 0.4889192
#> 5 -1.00000000 0.6666667 0.3333333 1.0000000
#> 6 NaN 1.0000000 0.0000000 0.0000000
#> eigen_centrality_index eigen_centrality_eigenvalue
#> 1 0.6385488 3.680389
#> 2 0.6131393 4.419380
#> 3 0.5291669 7.257172
#> 4 0.6107669 3.750958
#> 5 0.5857864 1.414214
#> 6 0.0000000 2.000000
The remainder of the output and customization follows the general case for
buildRepSeqNetwork()
.