05 · TensorQTL eQTL Discovery

Overview

This pipeline performs cell-type-specific cis-eQTL mapping using TensorQTL, a GPU-accelerated implementation of the FastQTL permutation framework. It runs four mapping modes (nominal, permutation, independent, and trans) across all 19 cell types, preceded by expression and covariate preparation steps.


Workflow Logic

graph TD
    classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
    classDef out fill:#e1f5fe,stroke:#01579b,color:#000;

    A[Pseudobulk BED + PCA covariates] --> B(prep_tensorQTL_input)
    B --> C(zip_pblk_cnts)
    B --> D(split_covariates)
    E[Final VCF] --> F(convert_genotypes)
    C --> G(tensorqtl_nom)
    D --> G
    F --> G
    C --> H(tensorqtl_perm)
    D --> H
    F --> H
    H --> I(tensorqtl_independent)
    C --> I
    D --> I
    F --> I
    C --> J(tensorqtl_trans)
    D --> J
    F --> J
    H --> K(tensorqtl_report)
    I --> K
    J --> K

    class A,B,C,D,E,F,G,H,I,J,K snazzy
    class G,H,I,J out

  1. prep_tensorQTL_input: Prepares per-cell-type normalised pseudobulk expression matrices (BED format) and covariate files. Merges genotype PCs and expression PCs, and normalises counts (quantile normalisation). Runs via tensorqtl_prep_input_files.R in the r_eqtl container.
  2. zip_pblk_cnts: bgzip-compresses and tabix-indexes the expression BED file required by TensorQTL.
  3. convert_genotypes: Converts the final filtered VCF to PLINK2 PGEN format.
  4. split_covariates: Generates covariate files for each combination of genotype PC count and expression PC count, enabling parameter sweep optimisation.
  5. tensorqtl_nom: Nominal pass — computes all SNP–gene association p-values within a 500 kb cis window. Output stored as per-chromosome Parquet files.
  6. tensorqtl_perm: Permutation pass — fits an adaptive permutation model (1,000–10,000 permutations) per gene to derive empirical p-values and FDR-corrected eGene calls.
  7. tensorqtl_independent: Stepwise conditional analysis on permutation pass results to identify independent secondary eQTL signals per gene.
  8. tensorqtl_trans: Trans eQTL mapping — genome-wide SNP–gene associations.
  9. tensorqtl_report: Renders an RMarkdown HTML summary of eQTL discovery results across all cell types.

Note

Singulairty Container

TensorQTL runs inside the tensorqtl.sif Singularity container and uses PyTorch for GPU-accelerated matrix operations during permutation testing. The permutation pass is the most computationally intensive step; GPU acceleration reduces runtime from days to hours per cell type.

The container is available as a dockerfile here. You can convert it to a .sif file by running

# From repo root dir
singularity pull tensorqtl_latest.sif docker://francois4/tensorqtl
mv tensorqtl*.sif resources/containers/tensorqtl.sif

Wildcard Structure

Each TensorQTL job is parameterised by four wildcards, enabling full factorial exploration of analysis parameters:

Wildcard Values Purpose
cell_type 19 (7 broad + 12 subtypes) Cell population
norm_method quantile Expression normalisation
geno_pc 4 Genotype PCs as covariates
exp_pc 40 Expression PCs as covariates

Additional values can be uncommented in config/config.yaml (e.g. exp_pcs: [10,20,30,40,50]) to run a full covariate optimisation sweep.


Key Config Parameters

tensorQTL:
  window: 500000       # Cis window (bp either side of TSS)
  perm_min: 1000       # Minimum permutations per gene
  perm_max: 10000      # Maximum permutations per gene
  geno_pcs: 4          # Genotype PCs included as covariates
  exp_pcs: [40]        # Expression PCs included as covariates
  norm_methods: ["quantile"]

Output Files

Rule Output format Description
tensorqtl_nom .cis_qtl_pairs.{chr}.parquet All nominal cis associations
tensorqtl_perm .cis_qtl.txt.gz Top eQTL per gene + empirical p-values
tensorqtl_independent .cis_independent_qtl.txt.gz Conditionally independent signals
tensorqtl_trans .trans_qtl_pairs.txt.gz Trans associations

Technical Requirements

Category Detail
eQTL tool TensorQTL (PyTorch GPU backend)
Container tensorqtl.sif (eQTL runs); r_eqtl.sif (input prep); gtex_eqtl.sif (bgzip/tabix); r_eqtl.sif (report)
Env modules plink/2.0 (genotype conversion)
Input Pseudobulk BED (from 02-scanpy); filtered VCF + PCA covariates (from 04-geno-post)

Resource Profile

Rule Threads RAM Walltime
prep_tensorQTL_input 1 6 GB 5h
zip_pblk_cnts 1 5 GB 1h
split_covariates 1 6 GB 5h
tensorqtl_nom 10 100 GB 5h
tensorqtl_perm 10 100 GB 5h
tensorqtl_independent 10 100 GB 5h
tensorqtl_trans 10 100 GB 5h
Back to top