mzytnicki/pancons
Find conserved regions at base pair resolution
PanCons
Description
PanCons aims at assessing genome conservation, based on a pangenome graph.
It is a successor of PanSel, at base level resolution.
Installation
Download the code, and make should do.
Usage
In its simplest form:
pancons -i input.gfa -r reference_name > output.bed 2> output.log
Compulsory parameters:
-i string: file name in GFA format
-r string: reference path name (should be in the GFA)
Input specific parameters:
W-lines specific parameters:
-p int: if the GFA uses W lines, the reference HapIndex (usually '0' for haploid, default: 0)
-c int: if the GFA uses W lines, the reference SeqId (usually the chromosome name)
P-lines specific parameters:
-c string: if the GFA uses P lines, reference named used in the output BED format
Optional parameters:
-n float/int: int = min # non-reference paths, float = min fraction of non-reference paths (default: 0.9)
-z int: max insertion size in nodes (default: 1000)
Other:
-h: print this help and exit
-v: print version number to stderr
Input GFA file
The GFA format is not standardized yet. PanSel needs minimal information in order to deduce chromosomes from paths.
- The GFA file should not be rGFA, and contain segments (
S) and full length paths (P) or walks (W). Other lines are unused. - This GFA can only store one or several chromosomes.
- The reference path name should be the second field of a
Pline (PathName) or aWline (SampleId) in the GFA file. - If the GFA file contains one chromosome, you can provide the name of this chromosome (to appear in the output file) using parameter
-c. - If the GFA file contains several chromosomes, only W lines are supported, and the name of the chromosome should be the fourth field (
SeqId).
Exemple
If you have a P-line file, and that the second field of the reference file is GCF_019614135.1#1#NZ_CP080645.1, the parameters are:
-r GCF_019614135.1 -p 1 -c NZ_CP080645.1
Output BED file
The output is a BED file, where the last field is the conservation score (0 is not conserved at all, and 1 is totally identical).
Testing PanCons
You can use PancCons on a small sample, provided by the BubbleGun package:
wget -c https://zenodo.org/record/7937947/files/ecoli50.gfa.zst
./pancons -i <( zstd -d -c ecoli50.gfa.zst ) -r GCF_019614135.1 -p 1 -c NZ_CP080645.1 > ecoli.bed 2> ecoli.log
Support
Please contact matthias.zytnicki@inrae.fr for any question you may have.
License
GNU General Public License v3.0.