08 · SuSiE Fine-mapping

Overview

This pipeline performs Bayesian fine-mapping of cis-eQTL signals using SuSiE (Sum of Single Effects), applied to all significant eGenes identified by TensorQTL. For each eGene, SuSiE models the association signal as a sum of independent effects and returns 95% credible sets — sets of SNPs that collectively contain the causal variant with ≥95% posterior probability. These credible sets are the primary output consumed by downstream S-LDSR (pipeline 09) to create the MaxCPP annotation.

Workflow Logic

graph TD
    classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
    classDef out fill:#e1f5fe,stroke:#01579b,color:#000;

    A[TensorQTL perm output] --> B(get_sig_eGenes)
    C[Pseudobulk BED] --> D(prep_susie_gene_meta)
    C --> E(prep_susie_input)
    F[TensorQTL covariates] --> E
    G[Final VCF] --> H(vcf_to_dosage)
    B --> I(run_susie)
    D --> I
    E --> I
    H --> I
    I --> J(merge_susie)
    J --> K(sort_susie)
    K --> L(susie_report)

    class A,B,C,D,E,F,G,H,I,J,K snazzy
    class L out

get_sig_eGenes: Extracts significant eGenes (FDR < 0.05) from TensorQTL permutation output for each cell type. These define the set of loci to be fine-mapped.
prep_susie_gene_meta: Creates a gene metadata file (chromosome, TSS position, gene ID) from the pseudobulk BED file for each cell type, used by run_susie to define the cis window.
vcf_to_dosage: Converts the imputed VCF to a bgzipped, tabix-indexed allele dosage matrix (CHROM, POS, REF, ALT + per-sample dosage), the genotype format expected by susie_run.R.
prep_susie_input: Subsets the pseudobulk expression BED and covariate file to overlapping samples, and writes a sample list for the fine-mapping run.
run_susie: Runs SuSiE fine-mapping in parallel batches (25 batches per cell type) across a ±1 Mb cis window. Per-gene outputs include credible set SNPs (.cred.txt), high-PIP credible set SNPs (.cred.hp.txt), and full SNP-level posterior statistics (.snp.txt).
merge_susie: Concatenates the 25 per-batch output files into a single merged file for each cell type and output suffix (.cred.hp.txt, .cred.txt, .snp.txt).
sort_susie: Sorts the merged high-PIP credible set file by chromosome and position, then bgzips for efficient downstream access.
susie_report: Renders an RMarkdown HTML report summarising credible set sizes, PIP distributions, and eGene counts per cell type.

Parallelisation Strategy

Fine-mapping is the most computationally intensive step of the pipeline. Each cell type’s eGenes are split into 25 batches and submitted as independent SLURM jobs, enabling all cell types to be processed in parallel. A ±1 Mb cis window is used for all genes.

Phase	Jobs per cell type	Total jobs (19 cell types)
`run_susie`	25	475
`merge_susie`	3 (one per suffix)	57
`sort_susie`	1	19

Output Files

For each cell type, three output file types are produced after merging:

File	Contents
`{cell_type}.mrgd.srtd.susie.cred.hp.txt.gz`	High-PIP SNPs (one row per credible set, sorted by position) — primary output for downstream pipelines
`{cell_type}.mrgd.susie.cred.txt.gz`	All SNPs in 95% credible sets
`{cell_type}.mrgd.susie.snp.txt.gz`	Full SNP-level posterior inclusion probabilities

Technical Requirements

Category	Detail
Software	SuSiE (`susieR` v24.01.1)
Container	`susier.sif` (`run_susie`); `r_eqtl.sif` (prep + report rules)
Input	TensorQTL permutation output (05-tensorqtl); pseudobulk BED + covariates (05-tensorqtl); final VCF (04-geno-post)
Output	Per-cell-type sorted, merged credible set files; HTML summary report

Resource Profile

Rule	Threads	RAM	Walltime
`get_sig_eGenes`	1	10 GB	30 min
`vcf_to_dosage`	1	10 GB	2h
`prep_susie_input`	1	10 GB	30 min
`run_susie` (per batch)	1	20 GB	4h
`merge_susie`	1	5 GB	30 min
`sort_susie`	1	5 GB	30 min
`susie_report`	1	20 GB	1h

Note

Batch indexing

SuSiE batch indices run from 1 to susie_batches (currently 25). The run_susie rule passes both the batch index and total batch count to susie_run.R, which internally partitions the eGene list and processes only the assigned slice. This avoids loading all eGenes into memory simultaneously.

Note

Writing full SNP output

The write_full_susie config flag (currently TRUE) controls whether the full SNP-level output (.snp.txt) is written for every gene. This substantially increases storage use but is required for post-hoc interrogation of PIP distributions. Set to FALSE to write credible set files only.