Get the strand information of all windows from bam files
Source:R/getStrandFromBamFile.R
getStrandFromBamFile.Rd
Get the number of positive/negative reads/coverage of all slding windows from the bam input files
Usage
getStrandFromBamFile(
files,
sequences,
mapqFilter = 0,
yieldSize = 1e+06,
winWidth = 1000L,
winStep = 100L,
readProp = 0.5,
paired
)
Arguments
- files
the input bam files. Your bamfiles should be sorted and have their index files located at the same path.
- sequences
character vector used to restrict analysed alignments to a subset of chromosomes (i.e. sequences) within the provided bam file. These correspond to chromosomes/scaffolds of the reference genome to which the reads were mapped. If absent, the whole bam file will be read. NB: This must match the chromosomes as defined in your reference genome. If the reference chromosomes were specified using the 'chr' prefix, ensure the supplied vector matches this specification.
- mapqFilter
every read that has mapping quality below
mapqFilter
will be removed before any analysis.- yieldSize
by default is 1e6, i.e. the bam file is read by block of reads whose size is defined by this parameter. It is used to pass to same parameter of the scanBam function.
- winWidth
the width of the sliding window, 1000 by default.
- winStep
the step length to sliding the window, 100 by default.
- readProp
A read is considered to be included in a window if at least
readProp
of it is in the window. Specified as a proportion. 0.5 by default.- paired
if TRUE then the input bamfile will be considered as paired-end reads. If missing, 100 thousands first reads will be inspected to test if the input bam file in paired-end or single-end.
Value
a DataFrame object containing the number of positive/negative reads and coverage of each window sliding across the bam file. The returned DataFrame has 10 columns:
Type: can be either SE if the input file contains single-end reads, or R1/R2 if the input file contains paired-end reads.
Seq: the reference sequence (chromosome/scaffold) that the reads were mapped to.
Start: the start position of the sliding window.
End: the end position of the sliding window.
NbPos/NbNeg: number of positive/negative reads that overlap the sliding window.
CovPos/CovNeg: number of bases coming from positive/negative reads that were mapped in the sliding window.
MaxCoverage: the maximum coverage within the sliding window.
File: the name of the input file.
Details
This function moves along the specified chromosomes (i.e. sequences) using a sliding window approach, and counts the number of reads in each window which align to the +/- strands of the reference genome. As well as the number of reads, the total coverage for each strand is also returned for each window, representing the total number of bases covered.
Average coverage for the entire window can be simply calculated by dividing the total coverage by the window size.
Examples
file <- system.file('extdata','s1.sorted.bam',package = 'strandCheckR')
win <- getStrandFromBamFile(file,sequences='10')
#> Testing paired end by checking the first 1e+05 reads of file /__w/_temp/Library/strandCheckR/extdata/s1.sorted.bam
#> Your bam file is single end
#> Reading file /__w/_temp/Library/strandCheckR/extdata/s1.sorted.bam
#> Read sequences 10
win
#> DataFrame with 1165 rows and 10 columns
#> Type Seq Start End NbPos NbNeg CovPos CovNeg MaxCoverage
#> <Rle> <Rle> <numeric> <numeric> <Rle> <Rle> <Rle> <Rle> <Rle>
#> 1 SE 10 7696701 7697700 0 17 0 393 17
#> 2 SE 10 7696801 7697800 0 17 0 393 17
#> 3 SE 10 7696901 7697900 0 17 0 393 17
#> 4 SE 10 7697001 7698000 0 17 0 393 17
#> 5 SE 10 7697101 7698100 0 17 0 393 17
#> ... ... ... ... ... ... ... ... ... ...
#> 1161 SE 10 7873401 7874400 0 2 0 100 2
#> 1162 SE 10 7873501 7874500 0 2 0 100 2
#> 1163 SE 10 7873601 7874600 0 2 0 78 2
#> 1164 SE 10 7893101 7894100 1 0 50 0 1
#> 1165 SE 10 7893201 7894200 0 2 50 100 2
#> File
#> <Rle>
#> 1 /__w/_temp/Library/s..
#> 2 /__w/_temp/Library/s..
#> 3 /__w/_temp/Library/s..
#> 4 /__w/_temp/Library/s..
#> 5 /__w/_temp/Library/s..
#> ... ...
#> 1161 /__w/_temp/Library/s..
#> 1162 /__w/_temp/Library/s..
#> 1163 /__w/_temp/Library/s..
#> 1164 /__w/_temp/Library/s..
#> 1165 /__w/_temp/Library/s..