ProActive
automatically detects regions of
gapped and elevated read coverage using a 2D pattern-matching algorithm.
ProActive
detects, characterizes and visualizes read
coverage patterns in both genomes and metagenomes. Optionally, users may
provide gene annotations associated with their genome or metagenome in
the form of a .gff file. In this case, ProActive
will
generate an additional output table containing the gene annotations
found within the detected regions of gapped and elevated read coverage.
Additionally, users can search for gene annotations of interest in the
output read coverage plots.
Visualizing read coverage data is important because gaps and elevations in coverage can be indicators of a variety of biological and non-biological scenarios, for example-
Since the cause for gaps and elevations in read coverage can be ambiguous, ProActive is best used as a screening method to identify genetic regions for further investigation with other tools!
References:
ProActive detects read coverage patterns using a pattern-matching algorithm that operates on pileup files. A pileup file is a file format where each row summarizes the ‘pileup’ of reads at specific genomic locations. Pileup files can be used to generate a rolling mean of read coverages and associated base pair positions which reduces data size while preserving read coverage patterns. ProActive requires that input pileups files be generated using a 100 bp window/bin size.
Pileup files can be generated by mapping sequencing reads to a
metagenome or genome fasta. Read mapping should be performed
using a high minimum identity (0.97 or higher) and
random mapping of ambiguous reads. The pileup files needed for
ProActive are generated using the .bam files produced during read
mapping. Some read mappers, like BBMap,
allow for the generation of pileup files in the bbmap.sh
command with use of the bincov
output with the
covbinsize=100
parameter/argument. Otherwise,
BBMap’s pileup.sh
can convert .bam files produced by any read mapper to pileup
files compatible with ProActive using the
bincov
output with binsize=100
.
NOTE: For detailed information on input file format, please see the vignette. Users may also use the ‘sampleMetagenomePileup’ and ‘sampleGenomePileup’ files that come pre-loaded with ProActive as a reference.
ProActive optionally accepts a .gff file as input. The .gff file must be associated with the same metagenome or genome used to create your pileup file. The .gff file should be a TSV and should follow the same general format described here.
Install ProActive from CRAN with:
install.packages("ProActive")
library(ProActive)
Install the development version of ProActive from GitHub with:
if (!require("devtools", quietly = TRUE)) {
install.packages("devtools")
}
::install_github("jlmaier12/ProActive")
devtoolslibrary(ProActive)
library(ProActive)
## Metagenome mode
<- ProActiveDetect(
MetagenomeProActive pileup = sampleMetagenomePileup,
mode = "metagenome",
gffTSV = sampleMetagenomegffTSV
)#> Preparing input file for pattern-matching...
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Summarizing pattern-matching results
#> Finding gene predictions in elevated or gapped regions of read coverage...
#> Finalizing output
#> Execution time: 2.09secs
#> 0 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length (< minContigLength)
#>
#> Elevation Gap NoPattern
#> 3 3 1
<- plotProActiveResults(pileup = sampleMetagenomePileup,
MetagenomePlots ProActiveResults = MetagenomeProActive)
<- geneAnnotationSearch(ProActiveResults = MetagenomeProActive,
MetagenomeGeneMatches pileup = sampleMetagenomePileup,
gffTSV = sampleMetagenomegffTSV,
geneOrProduct = "product",
keyWords = c("transport", "chemotaxis"))
#> Cleaning gff file...
#> Cleaning pileup file...
#> Searching for matching annotations...
#> 3 contigs/chunks have gene annotations that match one or more of the provided keyWords
## Genome mode
<- ProActiveDetect(
GenomeProActive pileup = sampleGenomePileup,
mode = "genome",
gffTSV = sampleGenomegffTSV
)#> Preparing input file for pattern-matching...
#> Starting pattern-matching...
#> A quarter of the way done with pattern-matching
#> Half of the way done with pattern-matching
#> Almost done with pattern-matching!
#> Summarizing pattern-matching results
#> Finding gene predictions in elevated or gapped regions of read coverage...
#> Finalizing output
#> Execution time: 29.7secs
#> 0 contigs were filtered out based on low read coverage
#> 0 contigs were filtered out based on length (< minContigLength)
#>
#> Elevation Gap NoPattern
#> 25 3 21
<- plotProActiveResults(pileup = sampleGenomePileup,
GenomePlots ProActiveResults = GenomeProActive)
<- geneAnnotationSearch(ProActiveResults = GenomeProActive,
GenomeGeneMatches pileup = sampleGenomePileup,
gffTSV = sampleGenomegffTSV,
geneOrProduct = "product",
keyWords = c("ribosomal"),
inGapOrElev = TRUE,
bpRange = 5000)
#> Cleaning gff file...
#> Cleaning pileup file...
#> Searching for matching annotations...
#> 8 contigs/chunks have gene annotations that match one or more of the provided keyWords