06 · eQTL Replication

Overview

This pipeline assesses the replication of our single-nucleus eQTL discoveries against four published brain eQTL and chromatin accessibility datasets. Replication is quantified using the π₁ statistic (proportion of true positives, i.e. non-nulls, among nominally significant associations in the reference dataset). The pipeline also tests for enrichment of our eQTL SNPs within open chromatin peaks from fetal brain snATAC-seq data.

Workflow Logic

graph TD
    classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
    classDef out fill:#e1f5fe,stroke:#01579b,color:#000;

    A(dwnld_obrien) --> E(pi1_enrich_obrien)
    B(dwnld_bryois) --> F(pi1_enrich_bryois)
    C(dwnld_ziffra) --> G(snp_lookup)
    C --> H(atac_enrich)
    D(dwnld_wen) --> I(pi1_enrich_wen)
    J[TensorQTL perm output] --> E
    J --> F
    J --> I
    J --> K(pi1_enrich_fugita)
    J --> L(pi1_enrich_internal)
    J --> M(cat_nom_qtl)
    M --> E
    M --> F
    M --> I
    M --> K
    M --> L
    G --> H
    E --> N(pi1_enrichment_report)
    F --> N
    G --> N
    H --> N
    I --> N
    K --> N
    L --> N

    class A,B,C,D,E,F,G,H,I,J,K,L,M snazzy
    class N out

Data Downloads

All four reference datasets are downloaded automatically by local Snakemake rules:

Rule	Dataset	Method	PMID
`dwnld_obrien`	O’Brien 2018 — adult bulk brain eQTLs	`curl` + unzip	30419947
`dwnld_bryois`	Bryois 2022 — adult single-cell eQTLs	JSON manifest + `curl` loop	35915177
`dwnld_ziffra`	Ziffra 2021 — fetal snATAC-seq peaks	`wget`	34616060
`dwnld_wen`	Wen 2024 — developmental bulk brain eQTLs	JSON manifest + `curl` loop	38781368

The Bryois and Wen datasets use a JSON manifest strategy: a pre-built JSON file maps file names to download URLs, and a jq/curl loop downloads each file individually, skipping already-present files.

Analysis Steps

cat_nom_qtl
Combines per-chromosome nominal Parquet output from TensorQTL into a single gzip-compressed TSV per cell type using replication_cat_nom_qtl.py. Required as input for all π₁ calculations.

snp_lookup
For each cell type, creates a SNP coordinate lookup table (from permutation-pass top eQTL SNPs) to enable matching against Ziffra ATAC-seq peak coordinates. Requires internet access — designed to be run locally or on a node with outbound network access (uses the LDlink API for LD proxy lookup).

atac_enrich
Tests whether top eQTL SNPs (and their LD proxies) are significantly enriched within fetal brain open chromatin peaks from Ziffra et al. 2021.

pi1_enrich_*
For each reference dataset, calculates the π₁ statistic as a measure of replication signal. Runs for O’Brien, Bryois (cross-matched by cell type), Wen, Fugita, and internally across our own cell types.

pi1_enrichment_report
Renders an RMarkdown HTML report aggregating all π₁ and ATAC enrichment results across all cell types and datasets.

Note

Internal replication

The pi1_enrich_internal rule also cross-replicates eQTL results between our own cell types (e.g. Glu-UL vs Glu-DL), providing a within-study replication measure that is useful for interpreting cell-type specificity.

Technical Requirements

Category	Detail
Container	`r_eqtl.sif` (all analysis rules)
Env modules	`compiler/gnu/7/3.0`, `jq` (for Bryois/Wen downloads)
Input	TensorQTL nominal Parquet + permutation results (from 05-tensorqtl)
Output	Per-cell-type π₁ enrichment RDS files; HTML report

Resource Profile

Rule	Threads	RAM	Walltime
Download rules	1	5 GB	variable
`cat_nom_qtl`	1	10 GB	1h
`pi1_enrich_*`	4	20 GB	1h
`pi1_enrichment_report`	1	10 GB	1h

Run Composition

The π₁ enrichment rules iterate over cell types, reference datasets, and covariate parameter combinations. The total job count is dominated by the cross-cell-type comparisons:

Rule	Approx. jobs	Dimensions
`cat_nom_qtl`	20	20 cell types
`pi1_enrich_obrien` / `_wen`	20 each	20 cell types
`pi1_enrich_bryois`	160	20 × 8 Bryois cell types
`pi1_enrich_internal`	400	20 × 20 cell types
`pi1_enrich_fugita`	140	20 × 7 Fugita cell types