TorchGWAS processes 20,480 phenotypes in 20 minutes on a single A100 GPU vs ~100 seconds per phenotype on a 64-core CPU?

TorchGWAS processes 20,480 phenotypes in 20 minutes on a single A100 GPU vs ~100 seconds per phenotype on a 64-core CPU

Achieves 300- to 1,700-fold throughput increase for large-scale phenotype screening?

Achieves 300- to 1,700-fold throughput increase for large-scale phenotype screening

Supports NumPy, PLINK, and BGEN inputs with automatic covariate adjustment?

Supports NumPy, PLINK, and BGEN inputs with automatic covariate adjustment

Research & Papers

TorchGWAS runs GWAS 1,700x faster using GPU acceleration

arXiv cs.DC April 24, 2026

⚡Analyze 20,480 phenotypes in 20 minutes on a single A100 GPU...

Deep Dive

Modern bioinformatics workflows in imaging and representation learning now routinely generate thousands to tens of thousands of quantitative phenotypes from a single cohort. Running genome-wide association analyses trait by trait in these settings creates a severe computational bottleneck. Established tools like fastGWA are highly effective for individual traits but were never designed for phenotype-rich screening where the same genotype matrix is reused across a large panel. TorchGWAS directly addresses this gap by leveraging GPU hardware to massively parallelize the association testing process.

In a head-to-head benchmark using 8.9 million markers and 23,000 samples, fastGWA required approximately 100 seconds per phenotype on an AMD EPYC 7763 64-core CPU. TorchGWAS completed 2,048 phenotypes in just 10 minutes and 20,480 phenotypes in 20 minutes on a single NVIDIA A100 GPU — a throughput improvement of 300- to 1,700-fold. The framework supports NumPy, PLINK, and BGEN genotype inputs, automatically aligns phenotype and covariate tables by sample identifier, and performs internal covariate adjustment. Its current public release provides stable Python and command-line workflows for linear GWAS and multivariate phenotype screening, along with tutorials and benchmark scripts.

Key Points

TorchGWAS processes 20,480 phenotypes in 20 minutes on a single A100 GPU vs ~100 seconds per phenotype on a 64-core CPU
Achieves 300- to 1,700-fold throughput increase for large-scale phenotype screening
Supports NumPy, PLINK, and BGEN inputs with automatic covariate adjustment

Why It Matters

Makes genome-wide screening of thousands of traits practical, unlocking faster discoveries in imaging genetics and phenomics.

Read Original Article

TorchGWAS runs GWAS 1,700x faster using GPU acceleration

Why It Matters

Related Articles

🚀 Stay Ahead in AI