sffdr Package Vignette

Andrew J. Bass and Chris Wallace

2024-11-29

Overview

The sffdr package implements the surrogate functional false discovery rate (sfFDR) procedure which integrates GWAS summary statistics of related traits to increase power within the fFDR framework. The inputs into sffdr are a set of p-values from a GWAS of interest and a set of p-values (or test statistics) from one or many informative GWAS. There are a few key quantities estimated by the package: the functional q-value, functional local FDR, and functional p-value.

The quantities estimated by sffdr can be used for a variety of significance analyses in genome-wide studies while incorporating informative data: the functional q-values provide positive FDR (related to FDR) control, the functional local FDRs can be used for post-GWAS analysis such as functional fine mapping, and the functional p-values provide type I error rate control.

Citing this package

The methods implemented in this package are described in:

Bass AJ, Wallace C. Exploiting pleiotropy to enhance variant discovery with functional false discovery rates. medRxiv; 2024.

Note that this work is an extension of the functional FDR methodology and the software builds on some of the functions in the fFDR package found at https://github.com/StoreyLab/fFDR.

Getting help

To report any bugs or issues related to usage please report it on GitHub at https://github.com/ajbass/sffdr.

Installation

To install the development version of the package:

# install development version of package
install.packages("devtools")
library("devtools")
devtools::install_github("ajbass/sffdr")

Quick start

Load the sffdr package and example data set:

library(sffdr)
set.seed(123)
data(bmi)
head(sumstats)
#>         bmi      bfp cholesterol triglycerides
#> 1 0.4959680 0.313475   0.0394766      0.148379
#> 2 0.6937320 0.565400   0.8986590      0.298349
#> 3 0.0843257 0.447650   0.7301960      0.715846
#> 4 0.2623540 0.911920   0.1753410      0.775170
#> 5 0.2150660 0.668330   0.0526082      0.603160
#> 6 0.9501450 0.607349   0.2759850      0.867493
p <- sumstats$bmi
z <- as.matrix(sumstats[,-1])

The sumstats data frame contains 10,000 independent p-values for body mass index (BMI), body fat percentage (BFP), cholesterol, and triglycerides. In this example, our primary trait of interest is BMI and the informative traits are BFP, cholesterol, and triglycerides.

We first model the relationship between the functional proportion of true null hypotheses and the summary statistics using the fpi0est function. The function pi0_model can be used to help create the design matrix for fpi0est.

# Create model: choose knots at small quantiles
mpi0 <- pi0_model(z = z,
                  knots = c(0.01, 0.025, 0.05, 0.1))

# Estimation of functional pi0 using the design matrix in mpi0
fpi0 <- fpi0est(p = p,
                z = mpi0$zt,
                pi0_model = mpi0$fmod)

Alternatively, the model can be directly inputted into fpi0est:

# Create design matrix (can include other variables (e.g., MAF) or specify more complicated models)
fmod <- "~ns(bfp, knots = c(0.01, 0.025, 0.05, 0.1))"
fpi0_mod <- fpi0est(p = p,
                    z = mpi0$zt,
                    pi0_model = fmod)

You should look at the p-values in your informative studies to choose the location (quantiles) of the knots. Ideally, it should cover the location of the non-null p-values. In our analyses, we found that knots = c(0.005, 0.01, 0.025, 0.05, 0.1) performed well when there were about 160,000 LD-independent SNPs. Note that the above example data set only has 10,000 p-values.

The functional FDR and p-value quantities can be estimated with the sffdr function:

# apply sfFDR
sffdr_out <- sffdr(p, fpi0 = fpi0$fpi0)   

# plot significance results
plot(sffdr_out, rng = c(0, 1e-6))
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.


# Functional P-values, Q-values, and local FDR
fp <- sffdr_out$fpvalues
fq <- sffdr_out$fqvalues
flfdr <- sffdr_out$flfdr

The functional q-value (fqvalue) is a measure of significance in terms of the positive FDR (closely related to FDR), the local FDR (flfdr) is a posterior error probability, and the functional p-value (fpvalue) can be interpreted as a standard p-value. See ?sffdr for additional details and input arguments (users may want to change epsilon = min(p) depending on how small the primary study p-values are).

Thus far, we have assumed that the SNPs are independent. Below we show how to specify LD-independent SNPs in the software:

# Boolean to specify which SNPs are independent (e.g., pruning)
# All SNPs are LD-independent in this example data set 
indep_snps <- rep(TRUE, length(p))

# Create model 
mpi0 <- pi0_model(z = z,
                  indep_snps = indep_snps,
                  knots = c(0.01, 0.025, 0.05, 0.1))

# Estimation fpi0 using design matrix from mpi0
fpi0 <- fpi0est(p = p,
                z = mpi0$zt,
                indep_snps = indep_snps,
                pi0_model = mpi0$fmod)

# Estimate FDR quantities and functional p-value
sffdr_out <- sffdr(p,
                   fpi0 = fpi0$fpi0,
                   indep_snps = indep_snps)

Note that the LD-independent SNPs are used for model fitting in sfFDR, and the functional p-values, q-values, and local FDRs are estimated for all SNPs. See ?ffinemap to perform functional fine mapping in a region of interest (assuming a single causal locus) with the functional local FDR.