Skip to contents

Get the number of positive/negative reads/coverage of all slding windows from the bam input files

Usage

getStrandFromBamFile(
  files,
  sequences,
  mapqFilter = 0,
  yieldSize = 1e+06,
  winWidth = 1000L,
  winStep = 100L,
  readProp = 0.5,
  paired
)

Arguments

files

the input bam files. Your bamfiles should be sorted and have their index files located at the same path.

sequences

character vector used to restrict analysed alignments to a subset of chromosomes (i.e. sequences) within the provided bam file. These correspond to chromosomes/scaffolds of the reference genome to which the reads were mapped. If absent, the whole bam file will be read. NB: This must match the chromosomes as defined in your reference genome. If the reference chromosomes were specified using the 'chr' prefix, ensure the supplied vector matches this specification.

mapqFilter

every read that has mapping quality below mapqFilter will be removed before any analysis.

yieldSize

by default is 1e6, i.e. the bam file is read by block of reads whose size is defined by this parameter. It is used to pass to same parameter of the scanBam function.

winWidth

the width of the sliding window, 1000 by default.

winStep

the step length to sliding the window, 100 by default.

readProp

A read is considered to be included in a window if at least readProp of it is in the window. Specified as a proportion. 0.5 by default.

paired

if TRUE then the input bamfile will be considered as paired-end reads. If missing, 100 thousands first reads will be inspected to test if the input bam file in paired-end or single-end.

Value

a DataFrame object containing the number of positive/negative reads and coverage of each window sliding across the bam file. The returned DataFrame has 10 columns:

Type: can be either SE if the input file contains single-end reads, or R1/R2 if the input file contains paired-end reads.

Seq: the reference sequence (chromosome/scaffold) that the reads were mapped to.

Start: the start position of the sliding window.

End: the end position of the sliding window.

NbPos/NbNeg: number of positive/negative reads that overlap the sliding window.

CovPos/CovNeg: number of bases coming from positive/negative reads that were mapped in the sliding window.

MaxCoverage: the maximum coverage within the sliding window.

File: the name of the input file.

Details

This function moves along the specified chromosomes (i.e. sequences) using a sliding window approach, and counts the number of reads in each window which align to the +/- strands of the reference genome. As well as the number of reads, the total coverage for each strand is also returned for each window, representing the total number of bases covered.

Average coverage for the entire window can be simply calculated by dividing the total coverage by the window size.

Examples

file <- system.file('extdata','s1.sorted.bam',package = 'strandCheckR')
win <- getStrandFromBamFile(file,sequences='10')
#> Testing paired end by checking the first 1e+05 reads of file /__w/_temp/Library/strandCheckR/extdata/s1.sorted.bam
#> Your bam file is single end
#> Reading file /__w/_temp/Library/strandCheckR/extdata/s1.sorted.bam
#> Read sequences 10
win
#> DataFrame with 1165 rows and 10 columns
#>       Type   Seq     Start       End NbPos NbNeg CovPos CovNeg MaxCoverage
#>      <Rle> <Rle> <numeric> <numeric> <Rle> <Rle>  <Rle>  <Rle>       <Rle>
#> 1       SE    10   7696701   7697700     0    17      0    393          17
#> 2       SE    10   7696801   7697800     0    17      0    393          17
#> 3       SE    10   7696901   7697900     0    17      0    393          17
#> 4       SE    10   7697001   7698000     0    17      0    393          17
#> 5       SE    10   7697101   7698100     0    17      0    393          17
#> ...    ...   ...       ...       ...   ...   ...    ...    ...         ...
#> 1161    SE    10   7873401   7874400     0     2      0    100           2
#> 1162    SE    10   7873501   7874500     0     2      0    100           2
#> 1163    SE    10   7873601   7874600     0     2      0     78           2
#> 1164    SE    10   7893101   7894100     1     0     50      0           1
#> 1165    SE    10   7893201   7894200     0     2     50    100           2
#>                        File
#>                       <Rle>
#> 1    /__w/_temp/Library/s..
#> 2    /__w/_temp/Library/s..
#> 3    /__w/_temp/Library/s..
#> 4    /__w/_temp/Library/s..
#> 5    /__w/_temp/Library/s..
#> ...                     ...
#> 1161 /__w/_temp/Library/s..
#> 1162 /__w/_temp/Library/s..
#> 1163 /__w/_temp/Library/s..
#> 1164 /__w/_temp/Library/s..
#> 1165 /__w/_temp/Library/s..