09 · Stratified LD Score Regression (S-LDSR)

Overview

This pipeline uses Stratified LD Score Regression (S-LDSR) to test whether cell-type-specific eQTL loci (defined by SuSiE fine-mapping) are enriched for the heritability of five neuropsychiatric GWAS traits. It generates cell-type-specific LD score annotations from SuSiE credible sets, computes LD scores against hg38 reference panels, and runs stratified regression against munged GWAS summary statistics.

Workflow Logic

graph TD
    classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
    classDef out fill:#e1f5fe,stroke:#01579b,color:#000;

    A(get_hg38_refs) --> C(lift_hapmap3_snps)
    B(get_hg19_refs) --> C
    A --> D(ldsr_ld_scores_hg38)
    C --> D
    E[SuSiE credible sets] --> F(make_annot)
    F --> D
    G[Munged GWAS .sumstats.gz] --> H(ldsr_strat_hg38_bl_v12)
    D --> H
    A --> H
    H --> I(ldsr_strat_summary)
    I --> J(ldsr_report)

    class A,B,C,D,E,F,G,H,I snazzy
    class J out

get_hg38_refs: Downloads the S-LDSR hg38 reference panel (1000 Genomes baseline v1.2 LD scores, weights, frequency files, BIM files).
get_hg19_refs: Downloads the corresponding hg19 reference panel (required for HapMap3 SNP liftover).
lift_hapmap3_snps: Lifts the HapMap3 SNP list from hg19 to hg38 coordinates using CrossMap, producing the SNP set used for LD score computation.
make_annot: For each cell type and chromosome, converts SuSiE credible set SNPs into LDSR-compatible binary annotation files — one for MaxCPP (highest-PIP SNP per credible set) and one for CS95 (all SNPs in 95% credible sets).
ldsr_ld_scores_hg38: Computes stratified LD scores for each cell type × annotation type × chromosome combination against the 1000 Genomes hg38 BIM reference.
ldsr_strat_hg38_bl_v12: Runs stratified regression (cell-type annotation + baseline v1.2 model) for each cell type × annotation × GWAS combination.
ldsr_strat_summary: Aggregates all .results files into a single summary TSV per annotation type.
ldsr_report: Renders an RMarkdown HTML report of enrichment coefficients and p-values across all cell types and traits.

Note

LDSR environment activation

LDSC requires Python 2.7, which is incompatible with the Snakemake environment. Both the ldsr_ld_scores_hg38 and ldsr_strat_hg38_bl_v12 rules activate the ldsr conda environment via an eval hook inside the shell block:

eval "$(/apps/languages/miniforge3/24.3.0-0/bin/conda shell.bash hook)"
conda activate ldsr

Note

Coordinate consistency

All analyses use hg38 reference panels throughout. GWAS summary statistics aligned to hg19 were lifted to hg38 in the 07-prep-GWAS pipeline, so no further coordinate conversion is required here.

Annotation Types

Annotation	Definition	Biological rationale
MaxCPP	Single SNP with maximum cumulative posterior probability per cell type	Fine-mapped eQTL signal; high specificity
CS95	All SNPs in 95% credible sets per cell type	Broader fine-mapped window; higher sensitivity

Run Composition

The pipeline fans out across four dimensions simultaneously:

Phase	Jobs	Dimensions
`make_annot`	836	19 cell types × 2 annot types × 22 chr
`ldsr_ld_scores_hg38`	836	20 × 2 × 22
`ldsr_strat_hg38_bl_v12`	228	19 × 2 × 6 GWAS

The SLURM profile’s jobs: 500 cap and qos=maxjobs500 ensure all 838 annotation/LD score jobs complete in parallel without queue saturation.

Technical Requirements

Category	Detail
Software	LDSC v1.0.1
Environment	`ldsr` conda (Python 2.7)
Container	`r_eqtl.sif` (make_annot, report); `ubuntu_22.04.sif` (HapMap3 liftover)
Reference	1000G hg38 baseline v1.2; hg38 plink BIM files; HapMap3 SNP list
Input	SuSiE credible sets (08-susie); munged GWAS (07-prep-GWAS)
Output	Per-cell-type enrichment `.results` files; summary TSV; HTML report

Resource Profile

Individual jobs are lightweight; the pipeline derives its throughput from parallelism:

Rule	Threads	RAM	Walltime
`make_annot`	1	5 GB	30 min
`ldsr_ld_scores_hg38`	1	10 GB	1h
`ldsr_strat_hg38_bl_v12`	1	10 GB	1h
`ldsr_report`	1	10 GB	30 min