The sffdr
package implements the surrogate functional
false discovery rate (sfFDR) procedure which integrates GWAS summary
statistics of related traits to increase power within the fFDR
framework. The inputs into sffdr
are a set of p-values from
a GWAS of interest and a set of p-values (or test statistics) from one
or many informative GWAS. There are a few key quantities estimated by
the package: the functional q-value, functional local FDR, and
functional p-value.
The quantities estimated by sffdr
can be used for a
variety of significance analyses in genome-wide studies while
incorporating informative data: the functional q-values provide positive
FDR (related to FDR) control, the functional local FDRs can be used for
post-GWAS analysis such as functional fine mapping, and the functional
p-values provide type I error rate control.
The methods implemented in this package are described in:
Bass AJ, Wallace C. Exploiting pleiotropy to enhance variant discovery with functional false discovery rates. medRxiv; 2024.
Note that this work is an extension of the functional FDR methodology
and the software builds on some of the functions in the
fFDR
package found at https://github.com/StoreyLab/fFDR.
To report any bugs or issues related to usage please report it on GitHub at https://github.com/ajbass/sffdr.
To install the development version of the package:
Load the sffdr
package and example data set:
library(sffdr)
set.seed(123)
data(bmi)
head(sumstats)
#> bmi bfp cholesterol triglycerides
#> 1 0.4959680 0.313475 0.0394766 0.148379
#> 2 0.6937320 0.565400 0.8986590 0.298349
#> 3 0.0843257 0.447650 0.7301960 0.715846
#> 4 0.2623540 0.911920 0.1753410 0.775170
#> 5 0.2150660 0.668330 0.0526082 0.603160
#> 6 0.9501450 0.607349 0.2759850 0.867493
p <- sumstats$bmi
z <- as.matrix(sumstats[,-1])
The sumstats
data frame contains 10,000 independent
p-values for body mass index (BMI), body fat percentage (BFP),
cholesterol, and triglycerides. In this example, our primary trait of
interest is BMI and the informative traits are BFP, cholesterol, and
triglycerides.
We first model the relationship between the functional proportion of
true null hypotheses and the summary statistics using the
fpi0est
function. The function pi0_model
can
be used to help create the design matrix for fpi0est
.
# Create model: choose knots at small quantiles
mpi0 <- pi0_model(z = z,
knots = c(0.01, 0.025, 0.05, 0.1))
# Estimation of functional pi0 using the design matrix in mpi0
fpi0 <- fpi0est(p = p,
z = mpi0$zt,
pi0_model = mpi0$fmod)
Alternatively, the model can be directly inputted into
fpi0est
:
# Create design matrix (can include other variables (e.g., MAF) or specify more complicated models)
fmod <- "~ns(bfp, knots = c(0.01, 0.025, 0.05, 0.1))"
fpi0_mod <- fpi0est(p = p,
z = mpi0$zt,
pi0_model = fmod)
You should look at the p-values in your informative studies to choose
the location (quantiles) of the knots. Ideally, it should cover the
location of the non-null p-values. In our analyses, we found that
knots = c(0.005, 0.01, 0.025, 0.05, 0.1)
performed well
when there were about 160,000 LD-independent SNPs. Note that the above
example data set only has 10,000 p-values.
The functional FDR and p-value quantities can be estimated with the
sffdr
function:
# apply sfFDR
sffdr_out <- sffdr(p, fpi0 = fpi0$fpi0)
# plot significance results
plot(sffdr_out, rng = c(0, 1e-6))
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
# Functional P-values, Q-values, and local FDR
fp <- sffdr_out$fpvalues
fq <- sffdr_out$fqvalues
flfdr <- sffdr_out$flfdr
The functional q-value (fqvalue
) is a measure of
significance in terms of the positive FDR (closely related to FDR), the
local FDR (flfdr
) is a posterior error probability, and the
functional p-value (fpvalue
) can be interpreted as a
standard p-value. See ?sffdr
for additional details and
input arguments (users may want to change epsilon = min(p)
depending on how small the primary study p-values are).
Thus far, we have assumed that the SNPs are independent. Below we show how to specify LD-independent SNPs in the software:
# Boolean to specify which SNPs are independent (e.g., pruning)
# All SNPs are LD-independent in this example data set
indep_snps <- rep(TRUE, length(p))
# Create model
mpi0 <- pi0_model(z = z,
indep_snps = indep_snps,
knots = c(0.01, 0.025, 0.05, 0.1))
# Estimation fpi0 using design matrix from mpi0
fpi0 <- fpi0est(p = p,
z = mpi0$zt,
indep_snps = indep_snps,
pi0_model = mpi0$fmod)
# Estimate FDR quantities and functional p-value
sffdr_out <- sffdr(p,
fpi0 = fpi0$fpi0,
indep_snps = indep_snps)
Note that the LD-independent SNPs are used for model fitting in
sfFDR, and the functional p-values, q-values, and local FDRs are
estimated for all SNPs. See ?ffinemap
to perform functional
fine mapping in a region of interest (assuming a single causal locus)
with the functional local FDR.