bamtofastq¶
bedtools bamtofastq
is a conversion utility for extracting FASTQ records
from sequence alignments in BAM format.
Note
If you are using CRAM as input, you will need to specify the full path describing the location of the relevant reference genome in FASTA format via the CRAM_REFERENCE environment variable. For example:
export CRAM_REFERENCE=/path/to/ref/g1k_v37_decoy.fa
Usage and option summary¶
Usage:
bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>
(or):
bamToFastq [OPTIONS] -i <BAM> -fq <FASTQ>
Option | Description |
---|---|
-fq2 | FASTQ for second end. Used if BAM contains paired-end data.
BAM should be sorted by query name
(samtools sort -n -o aln.qsort.bam aln.bam ) if creating
paired FASTQ with this option. |
-tags | Create FASTQ based on the mate info in the BAM R2 and Q2 tags. |
Default behavior¶
By default, each alignment in the BAM file is converted to a FASTQ record
in the -fq
file. The order of the records in the resulting FASTQ exactly
follows the order of the records in the BAM input file.
$ bedtools bamtofastq -i NA18152.bam -fq NA18152.fq
$ head -8 NA18152.fq
@NA18152-SRR007381.35051
GGAGACATATCATATAAGTAATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCGGCAAGGTCGAA
+
<<<;;<;<;;<;;;;;;;;;;;;<<<:;;;;;;;;;;;;;;;;::::::;;;;<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<;;;;;:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<;;;;;;;;;;<<<<<<<<;;;;;;;;;:;;;;;;;;;;;;;;;;;;;:;;;;8;;8888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;8966689666666299866669:899
@NA18152-SRR007381.637219
AATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCCGGCAAGGTCGAA
+
<<<<<<<<<<;;<;<;;;;<<;<888888899<;;;;;;<;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;<;<<<<<;;;;;;;;;;;;;<<<<;;;;;;;:::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<;;;;;;;;;;;;;;;;;;;;;;;<;;;;;;;;;;;;;;;;;;;;;;<888<;<<;;;;<<<<<<;;;;;<<<<<<<<;;;;;;;;;:;;;;888888899:::;;8;;;;;;;;;;;;;;;;;;;99;;99666896666966666600;96666669966
-fq2
Creating two FASTQ files for paired-end sequences.¶
If your BAM alignments are from paired-end sequence data, one can use the
-fq2
option to create two distinct FASTQ output files — one for
end 1 and one for end 2.
Note
When using this option, it is required that the BAM
file is sorted/grouped by the read name. This keeps the resulting records
in the two output FASTQ files in the same order. One can sort the BAM
file by query name with samtools sort -n -o aln.qsort.bam aln.bam
.
$ samtools sort -n -o aln.qsort.bam aln.bam
$ bedtools bamtofastq -i aln.qsort.bam \
-fq aln.end1.fq \
-fq2 aln.end2.fq
$ head -8 aln.end1.fq
@SRR069529.2276/1
CAGGGAGAAGGAGGTAGGAAAGAGAAAGGACCAGGGAGGGGCGCATACACAGGACGCTCCGTGCGGTGATAGCAGCACCACACTGTGTTCAGTCGTCTGGC
+
=;@>==###############################################################################################
@SRR069529.2406/1
GCTGGGAAAAGGATTCAGGATGTTGGTTTCTATCTTTGAGTTGCTGCTGTGCGGCTGTCCCTACACTCGCAGTACCCCTCGGACACCGTCTACTGTGGAGG
+
=5@><<:?<?
$ head -8 aln.end2.fq
@SRR069529.2276/2
AGACCCAGAGAGGGACAGGATCTGTCCCAGATCATAAAATAGGGGGAGTGCTCCGTAGAGGCGTGCGCGGTGGCACCGTGCAGTAGTACGGGTGAGCGGGG
+
#####################################################################################################
@SRR069529.2406/2
TTCCCTACCCCTGGGGTCAGGGACTACAGCCAAGGGGAGAACTTTAGCAAGTAGACGTTAGTTATTTTGATTCCAGTGGGGACGCGCGTGTAGCGAGTTGT
+
@>=AABB?AAACABBA>@?AAAA>B@@AB@AA:B@AA@??#############################################################