graph TD
classDef snazzy fill:#f1f1f1,stroke:#333,stroke-width:2px,color:#000;
classDef out fill:#e1f5fe,stroke:#01579b,color:#000;
A[Pseudobulk BED + PCA covariates] --> B(prep_tensorQTL_input)
B --> C(zip_pblk_cnts)
B --> D(split_covariates)
E[Final VCF] --> F(convert_genotypes)
C --> G(tensorqtl_nom)
D --> G
F --> G
C --> H(tensorqtl_perm)
D --> H
F --> H
H --> I(tensorqtl_independent)
C --> I
D --> I
F --> I
C --> J(tensorqtl_trans)
D --> J
F --> J
H --> K(tensorqtl_report)
I --> K
J --> K
class A,B,C,D,E,F,G,H,I,J,K snazzy
class G,H,I,J out
05 · TensorQTL eQTL Discovery
Overview
This pipeline performs cell-type-specific cis-eQTL mapping using TensorQTL, a GPU-accelerated implementation of the FastQTL permutation framework. It runs four mapping modes (nominal, permutation, independent, and trans) across all 19 cell types, preceded by expression and covariate preparation steps.
Workflow Logic
prep_tensorQTL_input: Prepares per-cell-type normalised pseudobulk expression matrices (BED format) and covariate files. Merges genotype PCs and expression PCs, and normalises counts (quantile normalisation). Runs viatensorqtl_prep_input_files.Rin ther_eqtlcontainer.zip_pblk_cnts: bgzip-compresses and tabix-indexes the expression BED file required by TensorQTL.convert_genotypes: Converts the final filtered VCF to PLINK2 PGEN format.split_covariates: Generates covariate files for each combination of genotype PC count and expression PC count, enabling parameter sweep optimisation.tensorqtl_nom: Nominal pass — computes all SNP–gene association p-values within a 500 kb cis window. Output stored as per-chromosome Parquet files.tensorqtl_perm: Permutation pass — fits an adaptive permutation model (1,000–10,000 permutations) per gene to derive empirical p-values and FDR-corrected eGene calls.tensorqtl_independent: Stepwise conditional analysis on permutation pass results to identify independent secondary eQTL signals per gene.tensorqtl_trans: Trans eQTL mapping — genome-wide SNP–gene associations.tensorqtl_report: Renders an RMarkdown HTML summary of eQTL discovery results across all cell types.
Singulairty Container
TensorQTL runs inside the tensorqtl.sif Singularity container and uses PyTorch for GPU-accelerated matrix operations during permutation testing. The permutation pass is the most computationally intensive step; GPU acceleration reduces runtime from days to hours per cell type.
The container is available as a dockerfile here. You can convert it to a .sif file by running
# From repo root dir
singularity pull tensorqtl_latest.sif docker://francois4/tensorqtl
mv tensorqtl*.sif resources/containers/tensorqtl.sifWildcard Structure
Each TensorQTL job is parameterised by four wildcards, enabling full factorial exploration of analysis parameters:
| Wildcard | Values | Purpose |
|---|---|---|
cell_type |
19 (7 broad + 12 subtypes) | Cell population |
norm_method |
quantile |
Expression normalisation |
geno_pc |
4 |
Genotype PCs as covariates |
exp_pc |
40 |
Expression PCs as covariates |
Additional values can be uncommented in config/config.yaml (e.g. exp_pcs: [10,20,30,40,50]) to run a full covariate optimisation sweep.
Key Config Parameters
tensorQTL:
window: 500000 # Cis window (bp either side of TSS)
perm_min: 1000 # Minimum permutations per gene
perm_max: 10000 # Maximum permutations per gene
geno_pcs: 4 # Genotype PCs included as covariates
exp_pcs: [40] # Expression PCs included as covariates
norm_methods: ["quantile"]Output Files
| Rule | Output format | Description |
|---|---|---|
tensorqtl_nom |
.cis_qtl_pairs.{chr}.parquet |
All nominal cis associations |
tensorqtl_perm |
.cis_qtl.txt.gz |
Top eQTL per gene + empirical p-values |
tensorqtl_independent |
.cis_independent_qtl.txt.gz |
Conditionally independent signals |
tensorqtl_trans |
.trans_qtl_pairs.txt.gz |
Trans associations |
Technical Requirements
| Category | Detail |
|---|---|
| eQTL tool | TensorQTL (PyTorch GPU backend) |
| Container | tensorqtl.sif (eQTL runs); r_eqtl.sif (input prep); gtex_eqtl.sif (bgzip/tabix); r_eqtl.sif (report) |
| Env modules | plink/2.0 (genotype conversion) |
| Input | Pseudobulk BED (from 02-scanpy); filtered VCF + PCA covariates (from 04-geno-post) |
Resource Profile
| Rule | Threads | RAM | Walltime |
|---|---|---|---|
prep_tensorQTL_input |
1 | 6 GB | 5h |
zip_pblk_cnts |
1 | 5 GB | 1h |
split_covariates |
1 | 6 GB | 5h |
tensorqtl_nom |
10 | 100 GB | 5h |
tensorqtl_perm |
10 | 100 GB | 5h |
tensorqtl_independent |
10 | 100 GB | 5h |
tensorqtl_trans |
10 | 100 GB | 5h |