R/qtl2ggplot Vignette

Brian S. Yandell

2024-03-15

Whole Genome Allele Scan

Load example DO data from web. For convenience, we attach package ‘ggplot2’ for the autoplot function. Functions from ‘qtl2’ are explicitly referenced with prefix qtl2::.

library(qtl2ggplot)
library(ggplot2)

Download ‘qtl2’ cross2 object.

DOex <- 
  qtl2::read_cross2(
    file.path(
      "https://raw.githubusercontent.com/rqtl",
       "qtl2data/master/DOex",
       "DOex.zip"))

With multiple alleles, it is useful to examine an additive allele model. Download pre-calculated allele probabilities (~5 MB) as follows:

tmpfile <- tempfile()
file <- paste0("https://raw.githubusercontent.com/rqtl/",
               "qtl2data/master/DOex/DOex_alleleprobs.rds")
download.file(file, tmpfile)
apr <- readRDS(tmpfile)
unlink(tmpfile)

Alternatively, calculate these directly.

pr <- qtl2::calc_genoprob(DOex, error_prob=0.002)
apr <- qtl2::genoprob_to_alleleprob(pr)

Genome allele scan.

scan_apr <- qtl2::scan1(apr, DOex$pheno)

Summary of peaks.

qtl2::find_peaks(scan_apr, DOex$pmap)
##   lodindex       lodcolumn chr      pos       lod
## 1        1 OF_immobile_pct   2 96.84223 10.173313
## 2        1 OF_immobile_pct   3 15.02006  5.971503
## 3        1 OF_immobile_pct   X 74.57257  6.939151

New summary method:

summary(scan_apr, DOex$pmap)
## # A tibble: 3 × 5
##   pheno           chr     pos   lod marker            
##   <chr>           <fct> <dbl> <dbl> <chr>             
## 1 OF_immobile_pct 2      96.8 10.2  backupUNC020000070
## 2 OF_immobile_pct 3      15.0  5.97 backupUNC030729939
## 3 OF_immobile_pct X      74.6  6.94 UNC200000454

The basic plot of genome scan,

plot(scan_apr, DOex$pmap)

and the grammar of graphics (ggplot2) version.

autoplot(scan_apr, DOex$pmap)

Genome Allele Scan for Chr 2

Subset to chr 2.

DOex <- DOex[,"2"]
apr <- subset(apr, chr = "2")

Scan chromosome and summarize peak

scan_apr <- qtl2::scan1(apr, DOex$pheno)
qtl2::find_peaks(scan_apr, DOex$pmap)
##   lodindex       lodcolumn chr      pos      lod
## 1        1 OF_immobile_pct   2 96.84223 10.17331
plot(scan_apr, DOex$pmap)

autoplot(scan_apr, DOex$pmap)

Examine coefficients for the 8 alleles

coefs <- qtl2::scan1coef(apr, DOex$pheno)

New summary method:

summary(coefs, scan_apr, DOex$pmap)
## # A tibble: 1 × 14
##   pheno       chr     pos   lod marker     A     B     C     D     E     F     G
##   <chr>       <fct> <dbl> <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OF_immobil… 2      96.8  10.2 backu… -6.93 -3.80 -6.27 -5.97 -30.1 -14.3 -3.87
## # ℹ 2 more variables: H <dbl>, intercept <dbl>
plot(coefs, DOex$pmap, 1:8, col = qtl2::CCcolors)

autoplot(coefs, DOex$pmap)

Plot allele effects over LOD scan.

plot(coefs, DOex$pmap, 1:8, col = qtl2::CCcolors, scan1_output = scan_apr)

autoplot(coefs, DOex$pmap, scan1_output = scan_apr,
         legend.position = "none")

Examine just some of the founder effects, without centering.

plot(coefs, DOex$pmap, c(5,8), col = qtl2::CCcolors[c(5,8)])

autoplot(coefs, DOex$pmap, c(5,8))

autoplot(coefs, DOex$pmap, c(5,8), facet = "geno")

plot(coefs, DOex$pmap, 4:5, col = qtl2::CCcolors[4:5], scan1_output = scan_apr)

autoplot(coefs, DOex$pmap, 4:5, scan1_output = scan_apr, legend.position = "none")

SNP Association Mapping

For SNP association mapping, be sure to use the genotype allele pair probabilities pr rather than the additive model allele probabilities apr. Download pre-calculated genotype probabilities (~19 MB) and subset to Chr 2.

tmpfile <- tempfile()
file <- paste0("https://raw.githubusercontent.com/rqtl/",
               "qtl2data/master/DOex/DOex_genoprobs.rds")
download.file(file, tmpfile)
pr <- readRDS(tmpfile)
unlink(tmpfile)
pr <- subset(pr, chr = "2")

Or, alternatively, calculate directly using the subsetted DOex.

pr <- qtl2::calc_genoprob(DOex, error_prob=0.002)

Download SNP information from web.

filename <- file.path("https://raw.githubusercontent.com/rqtl",
                      "qtl2data/master/DOex", 
                      "c2_snpinfo.rds")
tmpfile <- tempfile()
download.file(filename, tmpfile, quiet=TRUE)
snpinfo <- readRDS(tmpfile)
unlink(tmpfile)

Or alternatively, use query function approach.

snpdb_file <- system.file("extdata", "cc_variants_small.sqlite", package="qtl2")
query_variant <- qtl2::create_variant_query_func(snpdb_file)
snpinfo <- query_variant("2", 96.5, 98.5)

The SNP routines in ‘qtl2ggplot’ can distinguish SNP variants artificially add type to snpinfo with about 20% DEL to show how variants get plotted.

variants <- c("snp","indel","SV","INS","DEL","INV")
snpinfo$type <- 
  factor(
    sample(
      c(sample(variants[-1], 5000, replace = TRUE),
        rep("snp", nrow(snpinfo) - 5000))),
    variants)

Perform SNP association mapping. It is possible to use qtl2::scan1snps instead, which bundles these three routines, but we want to have the SNP probabilities for later use.

snpinfo <- qtl2::index_snps(DOex$pmap, snpinfo)
snppr <- qtl2::genoprob_to_snpprob(pr, snpinfo)
scan_snppr <- qtl2::scan1(snppr, DOex$pheno)

Plot results.

plot(scan_snppr, snpinfo, drop_hilit = 1.5)

autoplot(scan_snppr, snpinfo, drop_hilit = 1.5)

Plot just subset of distinct SNPs

plot(scan_snppr, snpinfo, show_all_snps=FALSE, drop_hilit = 1.5)

autoplot(scan_snppr, snpinfo, show_all_snps=FALSE, drop_hilit = 1.5)

Highlight the top snps (with LOD within 1.5 of max). Show as open circles of size 1.

plot(scan_snppr, snpinfo, drop_hilit=1.5, cex=1, pch=1)

autoplot(scan_snppr, snpinfo, drop_hilit=1.5, cex=2)

Strain Distribution Pattern (SDP) Scan

SNP assocation mapping is more useful with plots that emphasized the strain distribution pattern (SDP), which separate out SNPs based on their SDP and plot the top patterns. For instance sdp = 52 corresponds to pattern ABDGH:CEF. That is, the SNP genotype "AA" resulting from qtl2::genoprob_to_snpprob applied to pr corresponds to any of the 36 allele pairs with the two alleles drawn from the reference (ref) set of ABDGH (15 pairs: AA, AB, AD, AG, AH, BB, BD, BG, BH, DD, DG, DH, GG, GH, HH), "BB" has two alleles from the alternate (alt) set CEF (6 pairs: CC, CE, CF, EE, EF, FF), and "AB" has one from each for the heterogeneous (het) set (15 pairs: AC, AE, ..., HF). There are 255 possible sdps, but only a few (4 in our example) that need be examined carefully. One can think of these as a subset of markers for genome scan, where interest is only in those SNPS following a particular sdp; as with genome scans, we can fill in for missing data. That is, only a few SNPs may show a particular pattern, but key differences might be seen nearby if we impute SNPs of the same pattern. Here we highlight SDPs in SNPs within 3 of max; connect with lines.

autoplot(scan_snppr, snpinfo, patterns="all", drop_hilit=3, cex=2)

Highlight only top SDP patterns in SNPs.

autoplot(scan_snppr, snpinfo, patterns="hilit", drop_hilit=3, cex=2)

Looking at all SNPS is more useful than just focusing on mapped SNPs.

autoplot(scan_snppr, snpinfo, patterns="hilit", drop_hilit=3, cex=2,
     show_all_snps = FALSE)

Genes in Peak Region

Download Gene info for DOex from web via RDS.

filename <- file.path("https://raw.githubusercontent.com/rqtl",
                      "qtl2data/master/DOex", 
                      "c2_genes.rds")
tmpfile <- tempfile()
download.file(filename, tmpfile, quiet=TRUE)
gene_tbl <- readRDS(tmpfile)
unlink(tmpfile)

Or alternatively use query function approach.

dbfile <- system.file("extdata", "mouse_genes_small.sqlite", package="qtl2")
query_genes <- qtl2::create_gene_query_func(dbfile, filter="(source=='MGI')")
gene_tbl <- query_genes("2", 96.5, 98.5)

Plot genes. These can be aligned with the SNP association map or SDP scans.

qtl2::plot_genes(gene_tbl, xlim = c(96,99))

ggplot_genes(gene_tbl)

Multiple Phenotypes

Plot routines (except scan patterns for now) can accommodate multiple phenotypes. At present, it is best to stick to under 10. In the preambl of this document, a second phenotype, asin, was artifically created for illustration purposes.

Create artificial second phenotype as arcsic sqrt of first one.

DOex$pheno <- cbind(DOex$pheno, 
                    asin = asin(sqrt(DOex$pheno[,1] / 100)))

Genome scans for multile phenotypes

Redo genome allele scans on both phenotypes.

scan_apr <- qtl2::scan1(apr, DOex$pheno)
qtl2::find_peaks(scan_apr, DOex$pmap)
##   lodindex       lodcolumn chr      pos       lod
## 1        1 OF_immobile_pct   2 96.84223 10.173313
## 2        2            asin   2 96.84223  9.422816

Similar summary using new summary method:

summary(scan_apr, DOex$pmap)
## # A tibble: 2 × 5
##   pheno           chr     pos   lod marker            
##   <chr>           <fct> <dbl> <dbl> <chr>             
## 1 OF_immobile_pct 2      96.8 10.2  backupUNC020000070
## 2 asin            2      96.8  9.42 backupUNC020000070
plot(scan_apr, DOex$pmap, 1)
plot(scan_apr, DOex$pmap, 2, add = TRUE, col = "red")

autoplot(scan_apr, DOex$pmap, 1:2)

autoplot(scan_apr, DOex$pmap, 1:2, facet="pheno", scales = "free_x", shape = "free_x")

SNP scans for multiple phenotypes

Redo SNP scans on both phenotypes.

scan_snppr <- qtl2::scan1(snppr, DOex$pheno)

Using new summary method. The summary includes a range (min and max) for pos, as there could be multiple SNPs across a range of positions.

summary(scan_snppr, DOex$pmap, snpinfo)
## # A tibble: 8 × 7
##   pheno           max_pos min_pos   lod   sdp pattern   snp_id 
##   <chr>             <dbl>   <dbl> <dbl> <int> <chr>     <chr>  
## 1 OF_immobile_pct    97.0    96.9  7.31    52 ABDGH:CEF 3 SNPs 
## 2 OF_immobile_pct    96.8    96.8  7.04    48 ABCDGH:EF 12 SNPs
## 3 asin               96.8    96.8  6.55    48 ABCDGH:EF 12 SNPs
## 4 asin               97.0    96.9  6.49    52 ABDGH:CEF 3 SNPs 
## 5 OF_immobile_pct    98.2    96.9  5.99    16 ABCDFGH:E 25 SNPs
## 6 OF_immobile_pct    96.9    96.9  5.97    20 ABDFGH:CE 9 SNPs 
## 7 asin               97.0    96.9  5.56    56 ABCGH:DEF 88 SNPs
## 8 asin               97.0    96.9  5.32    60 ABGH:CDEF 69 SNPs

Plot results.

plot(scan_snppr, snpinfo, lodcolumn=1, cex=1, pch=1, drop_hilit = 1.5)

plot(scan_snppr, snpinfo, lodcolumn=2, cex=1, pch=1, drop_hilit = 1.5)

autoplot(scan_snppr, snpinfo, 1:2, facet="pheno",
         drop_hilit = 1.5)

plot(scan_snppr, snpinfo, lodcolumn=1, cex=1, pch=1, 
     show_all_snps = FALSE, drop_hilit = 1.5)

plot(scan_snppr, snpinfo, lodcolumn=2, cex=1, pch=1, 
     show_all_snps = FALSE, drop_hilit = 1.5)

autoplot(scan_snppr, snpinfo, 1:2, show_all_snps = FALSE, facet="pheno",
         cex=2, drop_hilit = 1.5)

Note that in the autoplot (using qtl2ggplot), the hilit points for the second trait are fewer than with the plot (using package ‘qtl2’). This is because the maxlod for the faceted autoplot is across both traits, and the other points for the second trait are too low.

autoplot(scan_snppr, snpinfo, 2, show_all_snps = FALSE, facet="pheno",
         cex=2, drop_hilit = 1.5)

Distinguish high values by color but leave others gray.

autoplot(scan_snppr, snpinfo, 1:2,show_all_snps = FALSE,
         facet_var = "pheno", drop_hilit = 2,
         col=8, col_hilit=1:2, cex=2) +
  geom_hline(yintercept = max(scan_snppr) - 2, col = "darkgrey", linetype = "dashed")

### SDP scans for multiple phenotypes

autoplot(scan_snppr, snpinfo, 2, patterns = "all",
             cex=2, drop_hilit=2)

autoplot(scan_snppr, snpinfo, 1:2, patterns = "all", cex=2,
             facet = "pheno", drop_hilit=3)

autoplot(scan_snppr, snpinfo, 1:2, patterns = "hilit", cex=2,
             drop_hilit=3, facet = "pheno", scales = "free")

autoplot(scan_snppr, snpinfo, 1:2, patterns = "hilit",
         show_all_snps = TRUE, cex=2,
         drop_hilit=3, facet = "pattern")

LOD peaks for multiple phenotypes

(peaks <- qtl2::find_peaks(scan_apr, DOex$pmap, drop = 1.5))
##   lodindex       lodcolumn chr      pos       lod    ci_lo    ci_hi
## 1        1 OF_immobile_pct   2 96.84223 10.173313 91.60175 103.8778
## 2        2            asin   2 96.84223  9.422816 91.60175 103.8778
qtl2::plot_peaks(peaks, DOex$pmap)

ggplot_peaks(peaks, DOex$pmap)

Coefficients for multiple phenotypes

out <- listof_scan1coef(apr, DOex$pheno, center = TRUE)

New summary method:

summary(out, scan_apr, DOex$pmap)
## # A tibble: 2 × 14
##   pheno   chr     pos   lod marker      A       B       C      D       E       F
##   <chr>   <fct> <dbl> <dbl> <chr>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
## 1 OF_imm… 2      96.8 10.2  backu… -7.49  -4.36   -6.83   -6.53  -30.6   -14.8  
## 2 asin    2      96.8  9.42 backu… -0.108 -0.0661 -0.0936 -0.109  -0.392  -0.209
## # ℹ 3 more variables: G <dbl>, H <dbl>, intercept <dbl>
ggplot2::autoplot(out, DOex$pmap, scales = "free")

summary(out, scan_apr, DOex$pmap)
## # A tibble: 2 × 14
##   pheno   chr     pos   lod marker      A       B       C      D       E       F
##   <chr>   <fct> <dbl> <dbl> <chr>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
## 1 OF_imm… 2      96.8 10.2  backu… -7.49  -4.36   -6.83   -6.53  -30.6   -14.8  
## 2 asin    2      96.8  9.42 backu… -0.108 -0.0661 -0.0936 -0.109  -0.392  -0.209
## # ℹ 3 more variables: G <dbl>, H <dbl>, intercept <dbl>

Coefficients for 36 allele pairs

This last section shows some very noisy images of coefficients for the 36 allele pairs. Generally, these will not be useful unless the cross is quite large. See also package ‘qtl2pattern’.

QTL effects for 36 allele pair model. Note that they are quite unstable, and the 36 allele pair max LOD is far from the peak for the additive (haplotype) model. Only showing effects with at least one E allele. Plots are truncated at +/-100 for viewability. Note also that ‘qtl2ggplot’ routines have some centering built in.

Find coefficients for 36 allele pair genome scan.

coefs36 <- qtl2::scan1coef(pr, DOex$pheno)
## Warning in qtl2::scan1coef(pr, DOex$pheno): Considering only the first
## phenotype.

All 36 allele pair QTL effects.

plot(coefs36, DOex$pmap, 1:36, col = 1:36, ylim=c(-100,100))

autoplot(coefs36, DOex$pmap, ylim=c(-100,100), colors = NULL, legend.position = "none")

The autoplot is centered by default (so mean across all alleles is mean of trait) to make coefficient plots easier to view. This can be turned off with the hidden center option.

autoplot(coefs36, DOex$pmap, ylim=c(-100,100), center = FALSE, 
         colors = NULL, legend.position = "none")

Only 8 allele pair QTL effects that contain E.

tmp <- qtl2ggplot:::modify_object(coefs36, 
                    coefs36[, stringr::str_detect(dimnames(coefs36)[[2]], "E")])
autoplot(tmp, DOex$pmap, ylim=c(-100,100))