cutadapt+BWA-MEM-mis1-i2,2-gapex2,2+filter_c5.0_c4.15+idxStats

Created: 2020-06-01 22:02:22      Last updated: 2020-06-01 22:19:49

This workflow is designed to genotype trinucleotide repeats from sequencing reads that span the repeat and its flanks

Detailed description as follows:

1. Input fastq sequencing reads files (Galaxy tool: Input dataset)

Input files for the workflow are single-end MiSeq reads (R1 produced using the protocol from Ciosi et al. 2018) or PacBio reads of insert (ROI) in fastq files format.

2. Removing Illumina sequencing adaptor sequence in 3’ of the MiSeq reads (Galaxy tool: Cutadapt 1.16)

Cutadapt removes the sequencing adaptor sequence from the 3’ end (custom 3' adapter sequence used: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC) of the forward reads (R1 produced using the protocol from Ciosi et al. 2018) allowing a maximum error rate of 0.39.

2’. QC of sequencing adaptor removal (Galaxy tool: FastQC version 0.71)

Carried out using FastQC after removing the sequencing adaptor to check that it has happened as expected

3. Input the fasta file containing the synthetic reference sequence(s) for BWA-MEM alignment

Multiple synthetic reference sequences containing different number of repeats are typically used (typically variable number of CAG and/or CCG repeats to genotype the HTT exon one trinucleotide repeat)

3’. BWA-MEM aligns reads to one or more reference sequences (Galaxy tool: Map with BWA-MEM version 0.7.17.1)

For MiSeq reads BWA-MEM parameters are kept as default except the following to make gap-related cost higher than mismatch-related costs:

Penalty for mismatch: 1

Gap open penalties for deletions and insertions: 2,2

Gap extension penalties: 2,2

4. BAM converted to SAM (Galaxy tool: BAM-to-SAM version 2.0.1)

Alignment output BAM files are converted to SAM format using BAM-to-SAM.

5. Reads are filtered out from the SAM file (Galaxy tool: Filter version 1.1.0)

Reads with a MAPQ score of 0 (obtained for reads aligned equally well to >1 reference sequence) and/or reads associated with an alignment that did not start in the sequence flanking the CAG repeat in 5’ (probably only relevant if reads have been generated by PCR and the 5’ end of the reference sequence corresponds to the PCR primer used to amplify the locus sequenced) are filtered out of the SAM file [filtering criteria used were c5>0 and c4<15 for HTT repeat reads generated with the forward PCR primers 31329 (5’- ATGAAGGCCTTCGAGTCCCTCAAGTCCTTC-3’) and c5>0 and c4<130 for HTT repeat reads generated with the forward PCR primers MS1F (5’-GCCCAGAGCCCCATTCATTG-3’); c5 for MAPQ and c4 for the position of the start of the alignment relatively to the reference sequence]. Number of header lines to skip = (number of reference sequences considered for the alignment) + 2

5’. QC of read filtering from SAM (Galaxy tool: FastQC version 0.71)

Carried out using FastQC after filtering out some reads from the SAM file to check that it has happened as expected

6. Generate an alignment report table from the SAM file (Galaxy tool: IdxStats version 2.0.1)

IdxStats is ran on each alignment SAM file to generate an alignment report table

7. Columns of the IdxStats alignment report table are removed to only keep the columns that contain the reference sequence identifier and the number of reads aligned to each of these reference sequences (Galaxy tool: Cut columns from a table version 1.0.2).

Removes column 1 and 3 from the initial IdxStats output.

 

Visible output files produced by the workflow:

two FastQC read quality reports (txt and html) post-cutadapt (output of step 2’)

a sam file of aligned reads for visualising repeat genotypes (output of step 4) 

one FastQC read quality report (txt) post-filtering of SAM file (output of step 5’)

two tab-delimited files with the number of reads aligned to each synthetic reference sequence considered for the alignment (output steps 7 and 8)

Information Preview

Information Import

Not currently available.


Information Workflow Components

Inputs (15)
Steps (10)
Outputs (21)

Information Workflow Type

Galaxy

Information Uploader

Information License

All versions of this Workflow are not licensed.

Information Version 1 (of 1)

Information Credits (0)

(People/Groups)

None

Information Attributions (0)

(Workflows/Files)

None

Information Tags (0)

None

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

 

Citations (0)

None


Version History

In chronological order:



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.