GitHunt
SA

Work at NIST: snakemake workflow to get short tandem repeats from WGS GIAB samples. R script for downstream processing and concordance

Bioinformatics pipeline

The snakemake workflow described here generates sequences (for both haplotypes) given a bed
file of loci and a VCF file. Optionally, you can also convert the multiline fasta into one line.
The image shows the entire bioinformatics pipeline with applications using targeted forensic markers.

The particular application this workflow was used included a VCF file from de novo assembly generated by GIAB team at NIST. However, the snakemake workflow can be broadly applied to any VCF file and loci of interest.

Pipeline Image

snakemake -s vcf2seq_v2.smk -c32

For dry run, use

snakemake -nps vcf2seq_v2.smk -c32

To look at the summary of the snakemake outputs, use

snakemake -s vc2seq_v2 -c32 --summary

Languages

R64.6%Python35.4%

Contributors

Created April 25, 2023
Updated April 28, 2023