GitHunt
MZ

mzytnicki/pancons

Find conserved regions at base pair resolution

PanCons

Description

PanCons aims at assessing genome conservation, based on a pangenome graph.
It is a successor of PanSel, at base level resolution.

Installation

Download the code, and make should do.

Usage

In its simplest form:

pancons -i input.gfa -r reference_name > output.bed 2> output.log
Compulsory parameters:
  -i string: file name in GFA format
  -r string: reference path name (should be in the GFA)
Input specific parameters:
  W-lines specific parameters:
    -p int: if the GFA uses W lines, the reference HapIndex (usually '0' for haploid, default: 0)
    -c int: if the GFA uses W lines, the reference SeqId (usually the chromosome name)
  P-lines specific parameters:
    -c string: if the GFA uses P lines, reference named used in the output BED format
Optional parameters:
  -n float/int: int = min # non-reference paths, float = min fraction of non-reference paths (default: 0.9)
  -z int: max insertion size in nodes (default: 1000)
Other:
  -h: print this help and exit
  -v: print version number to stderr

Input GFA file

The GFA format is not standardized yet. PanSel needs minimal information in order to deduce chromosomes from paths.

  • The GFA file should not be rGFA, and contain segments (S) and full length paths (P) or walks (W). Other lines are unused.
  • This GFA can only store one or several chromosomes.
  • The reference path name should be the second field of a P line (PathName) or a W line (SampleId) in the GFA file.
  • If the GFA file contains one chromosome, you can provide the name of this chromosome (to appear in the output file) using parameter -c.
  • If the GFA file contains several chromosomes, only W lines are supported, and the name of the chromosome should be the fourth field (SeqId).

Exemple

If you have a P-line file, and that the second field of the reference file is GCF_019614135.1#1#NZ_CP080645.1, the parameters are:

-r GCF_019614135.1 -p 1 -c NZ_CP080645.1

Output BED file

The output is a BED file, where the last field is the conservation score (0 is not conserved at all, and 1 is totally identical).

Testing PanCons

You can use PancCons on a small sample, provided by the BubbleGun package:

wget -c https://zenodo.org/record/7937947/files/ecoli50.gfa.zst
./pancons -i <( zstd -d -c ecoli50.gfa.zst ) -r GCF_019614135.1 -p 1 -c NZ_CP080645.1 > ecoli.bed 2> ecoli.log

Support

Please contact matthias.zytnicki@inrae.fr for any question you may have.

License

GNU General Public License v3.0.

Languages

C++99.6%Makefile0.4%

Contributors

GNU General Public License v3.0
Created January 5, 2026
Updated January 6, 2026