graph TD
classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
classDef out fill:#e1f5fe,stroke:#01579b,color:#000;
A[1000G hg38 PLINK files] --> B(create_ld_matrices)
C[FUSION .pos + .wgt.RDat files] --> D(copy_fusion_weights)
D --> E(run_ctwas)
B --> E
F[GWAS hg38 TSV] --> E
G[SMR 1000G BIM] --> E
E --> H(ctwas_report)
class A,B,C,D,E,F,G snazzy
class H out
12 · Causal TWAS (cTWAS)
Overview
This pipeline runs cTWAS (causal Transcriptome-Wide Association Study) to jointly model gene expression and GWAS variants, distinguishing genes with direct evidence of causal effects on disease from those with only indirect (LD-mediated) associations. It uses the FUSION weights generated in pipeline 11 and GWAS summary statistics from pipeline 07.
Workflow Logic
create_ld_matrices: Pre-computes LD matrices and variant information files from the 1000G hg38 PLINK reference panel. These are shared across all cell types and GWAS, so this rule runs once.copy_fusion_weights: Copies only genuine (non-zero-byte) FUSION weight files from pipeline 11 into a clean per-cell-type directory. Zero-byte stub files — created for genes where FUSION found insufficient heritability — causectwas::preprocess_weights()to crash, so they must be excluded before cTWAS runs.run_ctwas: Runs the full cTWAS analysis for each cell type × GWAS combination usingctwas_run.R. Fits the cTWAS mixture model, estimates prior probabilities for SNP and gene effects, and outputs posterior inclusion probabilities (PIPs) for all genes.ctwas_report: Renders an RMarkdown HTML report summarising cTWAS gene discoveries across cell types and GWAS traits, including locus plots where available.
Stub file filtering
FUSION produces zero-byte .wgt.RDat files for genes it skips (insufficient cis-SNP heritability). The copy_fusion_weights rule filters these out by reading the .pos file — which only lists genes with real weight files — and copying only those files:
awk 'NR>1 {print $2}' {input} | \
while read wgt; do
cp -v {params.src_dir}/"$wgt" {params.dest_dir}/
doneThis ensures cTWAS only sees valid weight files and avoids preprocess_weights() errors.
Key Config Parameters
# cTWAS uses the SMR 1000G reference BIM for variant information
# LD matrices are computed from ldsr_hg38_refs plink files
# GWAS input: hg38 TSV (not munged sumstats)The run_ctwas rule is parameterised by cell_type and gwas wildcards, producing one output RDS per combination.
Technical Requirements
| Category | Detail |
|---|---|
| Software | cTWAS R package |
| Container | twas.sif (all rules; contains cTWAS + FUSION dependencies) |
| Input | FUSION .wgt.RDat weights + .pos files (11-TWAS-weights); GWAS hg38 TSVs (07-prep-GWAS); SMR 1000G BIM (10-SMR) |
| Output | Per-cell-type × GWAS cTWAS .rds result files; HTML report |
Resource Profile
| Rule | Threads | RAM | Walltime |
|---|---|---|---|
create_ld_matrices |
5 | 20 GB | 3d |
copy_fusion_weights |
1 | 5 GB | 30 min |
run_ctwas |
6 | 96 GB | 3d |
ctwas_report |
1 | 10 GB | 1h |
Run Composition
| Phase | Jobs | Dimensions |
|---|---|---|
create_ld_matrices |
1 | Once |
copy_fusion_weights |
19 | 19 cell types |
run_ctwas |
114 | 19 cell types × 6 GWAS |