raysas/genetic-linkage-gwas
group project - application of family-based and population-based genetic analyses to both a monogenic disease (Clouston disease) and a complex multifactorial disease (Rheumatoid Arthritis)
NGS Linkage and GWAS Analysis
This repository contains the analyses performed as part of a Next-Generation Sequencing (NGS) methodologies project, focusing on the genetic investigation of two diseases with distinct genetic architectures:
- Clouston disease (monogenic, autosomal dominant)
- Rheumatoid arthritis (complex, multifactorial)
The project combines genetic linkage analysis and association studies using standard statistical genetics tools.
tools
- R (
paramlink,qqman) - PLINK
- MERLIN
- FBAT
๐ repository structure
simplified structure for project submission, the report in docs/
.
โโโ code/
โ โโโ main.ipynb
โโโ assets/
โโโ report/
โ โโโ NGS_Methodologies_Report.pdf
โโโ README.md
Project Overview
Clouston Disease (Monogenic Disorder)
For Clouston disease, family-based analyses were performed to identify the disease-associated locus.
1. Genetic Linkage Analysis (LOD score)
- Parametric linkage analysis using the LOD score method
- Analysis conducted on chromosome 13 using microsatellite markers
- Evaluation of different recombination fractions (ฮธ values)
Example family structure used in the analysis:
Figure 1: Example pedigree used for linkage analysis. Affected individuals are shown in black.
LOD score curve for a representative marker:
Figure 2: LOD score as a function of recombination fraction (ฮธ). The maximum LOD score is observed at ฮธ โ 0.
2. Familial Association Analysis (TDT)
Following linkage analysis, candidate variants within the GJB6 gene were tested using the Transmission Disequilibrium Test (TDT).
Figure 3: TDT results for SNPs within the GJB6 gene. SNP rs76179836 shows significant association.
Linkage disequilibrium verification using Ensembl:
Figure 4: Linkage disequilibrium (rยฒ) between rs76179836 and surrounding variants, indicating the SNP is likely causal.
Rheumatoid Arthritis (Complex Disease)
For Rheumatoid Arthritis (RA), both linkage and genome-wide association analyses were performed.
1. Non-Parametric Linkage Analysis (sib-pairs)
- Analysis conducted using MERLIN
- Identification of suggestive and significant linkage regions
LOD score plot for chromosome X:
Figure 5: Non-parametric linkage (NPL) analysis for chromosome X. The red line indicates the significance threshold.
LOD score plot for chromosome 6:
Figure 6: Suggestive linkage region on chromosome 6 around 50โ53 cM.
2. Genome-Wide Association Study (GWAS)
- GWAS performed using PLINK
- Allelic and genotypic association tests
- Bonferroni correction applied for multiple testing
Manhattan plots of GWAS results:
Figure 7: Manhattan plots for allelic (left) and genotypic (right) association tests. No variants surpass the Bonferroni threshold.






