prewas: data pre-processing for more informative bacterial GWAS



The prewas R package allows users to create a binary SNP matrix from a whole genome alignment. The SNP matrix includes the following features: (1) multiple line representation of multiallelic sites, (2) multiple line representation for SNPs present in overlapping genes, and (3) choice over the reference allele. Additionally, users can collapse SNPs into genes so the output is a binary gene matrix. Output from the prewas package should be used as the input to bacterial GWAS tools such as hogwash.


To install prewas follow these commands in R:


Note: this package depends on R (>= 3.5.0).


prewas is described in the paper: “prewas: data pre-processing for more informative bacterial GWAS”. The Rscripts and data for the paper’s figures and analyses can be found in the manuscript analysis repository.

A tutorial explaining how to use the package can be found in the vignette.


Katie Saund, Stephanie Thiede, and Zena Lapp contributed to this code.