While coala
primiary focuses on simulation of data, it
also offers to calculate summary statsitcs from real data. This is
particularly useful when comparing the statistics of real and simulated
data.
Rather than offering functions to import data directly, coala can
convert the internal formats of other R packages into its own format.
Currently, the PopGenome
package is supported, but we plan
to support ape
and pegas
in the future.
PopGenome
provides functions for reading various data
formats, including vcf
and fasta
. Please refer
to its documentation for detailed instructions. As an example, we will
read sequence data from a short fasta file that is included in
coala:
suppressPackageStartupMessages(library(PopGenome))
<- system.file("example_fasta_files", package = "coala")
fasta <- readData(fasta, progress_bar_switch = FALSE)
data_pg <- set.outgroup(data_pg, c("Individual_Out-1", "Individual_Out-2"))
data_pg <- list(paste0("Individual_1-", 1:5), paste0("Individual_2-", 1:5))
individuals <- set.populations(data_pg, individuals) data_pg
Here the sequences originate from two population and an outgroup. The outgroup is required for most summary statistics.
We can now convert data_pg
using the
as.segsites
function:
library(coala)
<- as.segsites(data_pg) segsites
Next, we calculate summary statistics using
calc_sumstats_from_data
:
<- coal_model(c(5, 5, 2), 1, 25) +
model feat_mutation(5) +
feat_outgroup(3) +
sumstat_sfs(population = 1)
<- calc_sumstats_from_data(model, segsites)
stats $sfs stats
Alternatively, it is also possible to pass the data_pg
object directly to calc_sumstats_from_data
:
<- calc_sumstats_from_data(model, data_pg)
stats $sfs stats
Please refer to the help pages for as.segsites
and
calc_sumstats_from_data
for additional information.