SmallRnaPipe

Why are we doing this pipeline ?

This pipeline uses some programms which aren't use by any other pipeline which analyse smallRnas. The existing pipelines analyse only microRNAs, a category of smallRNAs, and they don't consider in a satisfactory way the sequences aligning in a multiple way on the genome. These programms (srnaMapper and mmquant) make that this pipeline is very useful for the reproductibily of the analyses of smallRnas.

Arguments

Option	Parameter(s)	Description	Requirement
`--reads`	`fastq1 ....gz`	Input `fastq` file(s)	Required
`--genome`	`genome.fa`	A FA genome file	Required
`--index`	`directory/prefix`	Input genome index directory built by `bwa`	Optional
`--annotation`	`annotation.(gtf/gff)`	Input reference annotation file	Required
`--mature`	`mature.fa`	Input reference mature RNA file	Optional
`--hairpin`	`hairpin.fa`	Input reference hairpin RNA file	Optional
`--species`	`species`	Species name on miRBase	Optional
`--o_species`	`other_species`	Other species name on miRBase which is related to your species	Optional

Workflow

Programm	Action	Inputs	Command	Outputs
FASTQC	Control the quality of fastq files in input to have a detailed report about quality	Fastq file (reads)	`fastqc ech1.fastq ...`	HTML with a resume and a ZIP file for the MultiQC
trim_galore	Trimmming the adaptators on your reads	Fastq file (reads)	`trim_galore ech1.fastq ...`	un `*report.txt` for the multiqc and un fastq trimmed for the next step
prinseq-lite	Sort the reads with the "good" reads with great complexity and the "bad" reads with a low complexity	Fastq trimmed file (reads)	`prinseq-lite.pl -fastq ech1.trimmed.fastq ... -lc_method dust -lc_threshold 7`	2 fastq : one with the bad reads and un other with the good reads for the next step
BWA	If you haven't a bwa index for your genome, this process will do that.	Your genome FASTA and a prefix (name of your genome without.fa)	`bwa index -p $prefix $genome.fa`	5 five which are your index.
srnaMapper	Map the Fastq on the reference genome	the Index, the clens reads and the prefix	`srnaMapper -r $reads_clean -g direction/prefix_of_index -o $prefix.sam`	A sam file for each reads for the next step
mmquant	Quantifie the expression of the sRnas	the annotation file(in GFF or GFF3 pr GTF)and the bam/sam	`mmquant -a $annotation_file -r bam/sam -o prefix.tsv`	A table where, for each annotation, you have un number which represents the number of times the annotation has been spotted in the BAM
MultiQC	A great representation of all datas and results	Fastqc report, trimming report	`multiqc --config $config` (we use a specific config file because we want a specific display order	a HTML report
Bowtie index	Bowtie index your genome for mirdeep2	Your genome FASTA and a prefix (name of your genome without.fa)	`bowtie-build $genome.fa $prefix`	Several files .ebwt which are your bowtie index
mapper.pl	A mapping programm for mirDeep2	Your FASTQ files after the cleaning, your bowtie index (the prefix)	`mapper.pl $reads -e -h -i -j -m -p $index_base -s ${prefix_of_your_reads}_collapsed.fa -t ${prefix_of_your_reads}_reads_vs_refdb.arf -o 4`	A .arf file and a .fa file for mirdeep2
mirDeep2	A tool which annotate your reads	Your mapped files (mapped with mapper.pl and not srnaMapper), the name of your species, the hairpin FASTA of your species, the mature fasta of your species, the mature FASTA of a other species	`miRDeep2.pl $file_in_mapper.pl.fa $file_in_mapper.pl.arf $mature $mature_for_other_species $hairpin.fa -t $species_of_your_FASTQ_files`	a HTML file which recapitulates the analysis

Usage : a typical command line

nextflow run main_script1.nf --reads ech1.fastq --annotation ann1.gff3 --index index1 --genome genome1.fa

Be careful : the index must be a index built by bwa so index1 must be the prefix that you have written during the index construction by bwa

mzytnicki/SmallRnaPipe

SmallRnaPipe

Why are we doing this pipeline ?

Arguments

Workflow

Usage : a typical command line

On this page

Languages

Contributors

Latest Release