Multivariate analysis for two or more traits

Frederick Boehm

2019-11-01

The goal of this vignette is to introduce the functions that enable d-variate (for \(d \ge 2\)) QTL analysis for d traits that each univariately map to a shared region of the genome. One may then ask whether the d traits share a single, pleiotropic QTL. The alternative hypothesis is that there is more than one QTL giving rise to the phenotype-genotype associations.

is_inst <- function(pkg) {
    nzchar(system.file(package = pkg))
}
qtl2_indic <- is_inst("qtl2")
library(qtl2pleio)
library(qtl2)

We also load the qtl2 package with the library command.

Reading data from qtl2data repository on github

We’ll consider the DOex data in the qtl2data repository. We’ll download the DOex.zip file before calculating founder allele dosages.

file <- paste0("https://raw.githubusercontent.com/rqtl/",
               "qtl2data/master/DOex/DOex.zip")
DOex <- read_cross2(file)

Let’s calculate the founder allele dosages from the 36-state genotype probabilities.

probs <- calc_genoprob(DOex)
pr <- genoprob_to_alleleprob(probs)

We now have an allele probabilities object stored in pr.

Kinship calculations

For our statistical model, we need a kinship matrix. Although we don’t have genome-wide data - since we have allele probabilities for only 3 chromosomes - let’s calculate a kinship matrix using “leave-one-chromosome-out”. In practice, one would want to use allele probabilities from a full genome-wide set of markers.

calc_kinship(probs = pr, type = "loco") -> kinship

Simulating phenotypes with qtl2pleio::sim1

The function to simulate phenotypes in qtl2pleio is sim1. By examining its help page, we see that it takes five arguments. The help page also gives the dimensions of the inputs.

# set up the design matrix, X
pp <- pr[[2]] #we'll work with Chr 3's genotype data
dim(pp)
## [1] 261   8 102

We prepare a block-diagonal design matrix X that contains two nonzero blocks on the diagonal, one for each trait. We use here a function from the gemma2 R package to set up the needed matrix.

#Next, we prepare a design matrix X
X <- gemma2::stagger_mats(pp[ , , 50], pp[ , , 50], pp[ , , 50])
dim(X)
## [1] 783  24

\(X\) is a block-diagonal matrix, with 3 diagonal blocks of equal dimension.

# assemble B matrix of allele effects
B <- matrix(data = rep(rep(c(-1, 1), each = 4), times = 3), nrow = 8, ncol = 3, byrow = FALSE)
# verify that B is what we want:
B
##      [,1] [,2] [,3]
## [1,]   -1   -1   -1
## [2,]   -1   -1   -1
## [3,]   -1   -1   -1
## [4,]   -1   -1   -1
## [5,]    1    1    1
## [6,]    1    1    1
## [7,]    1    1    1
## [8,]    1    1    1
# set.seed to ensure reproducibility
set.seed(2018-01-30)
# call to sim1
Ypre <- sim1(X = X, B = B, Vg = diag(3), Ve = diag(3), kinship = kinship[[2]])
Y <- matrix(Ypre, nrow = 261, ncol = 3, byrow = FALSE)
rownames(Y) <- rownames(pp)
colnames(Y) <- c("tr1", "tr2", "tr3")

Let’s perform univariate QTL mapping for each trait (ie, each column) in the Y matrix.

s1 <- scan1(genoprobs = pr, pheno = Y, kinship = kinship)

Here is a plot of the results.

plot(s1, DOex$pmap$`3`)
plot(s1, DOex$pmap$`3`, lod = 2, col ="violetred", add=TRUE)
plot(s1, DOex$pmap$`3`, lod = 3, col ="green", add=TRUE)
legend("topleft", colnames(s1), lwd = 2, col=c("darkslateblue", "violetred", "green"), bg="gray92")

find_peaks(s1, map = DOex$pmap, threshold = 8)
##   lodindex lodcolumn chr       pos       lod
## 1        1       tr1   3  82.77806 18.595713
## 2        1       tr1   X  38.53903  8.674291
## 3        2       tr2   2 169.53615  9.150889
## 4        2       tr2   3  82.77806 16.931654
## 5        3       tr3   3  82.77806 21.632313

Perform three-dimensional scan as first step in pleiotropy v separate QTL hypothesis test

We now have the inputs that we need to do a pleiotropy vs. separate QTL test. We have the founder allele dosages for one chromosome, i.e., Chr 3, in the R object pp, the matrix of two trait measurements in Y, and a LOCO-derived kinship matrix. We also specify, via the start_snp argument, the starting point for the two-dimensional scan within the array of founder allele dosages. Here, we choose the 38th marker in the array as the starting point. Via the n_snp argument, we specify the number of markers to include in the two-dimensional scan. Here, we input 15, so that we fit the trivariate linear mixed effects model at 151515 = 3375 ordered triples of markers. In practice, we usually use between 100 and 300 markers for most bivariate scans.

Lastly, we specify the number of cores to use, with the n_cores argument. We set it to 1 here, to ensure that the vignette can be run by CRAN. However, in practice, you may wish to increase the number of cores to accelerate computing.

start_index <- 43
out <- scan_pvl(probs = pp,
                pheno = Y,
                kinship = kinship$`3`,
                start_snp = start_index,
                n_snp = 15, n_cores = 1
                )
## starting covariance matrices estimation with data from 261 subjects.
## covariance matrices estimation completed.

The number of cores available will vary by computer. For example, on my Macbook pro computer, with 16GB RAM, I have access to 8 cores. If I use all 8, I can’t do other computing tasks, so I often set n_cores to 7.

To check how many cores are available on your computer, if you’re using a Mac or linux operating system, run this code.

parallel::detectCores()

Create a profile LOD plot to visualize results of two-dimensional scan

To visualize results from our two-dimensional scan, we calculate profile LOD for each trait. The code below makes use of the R package ggplot2 to plot profile LODs over the scan region.

## # A tibble: 3,375 x 4
##    Var1               Var2               Var3               log10lik
##    <chr>              <chr>              <chr>                 <dbl>
##  1 backupUNC030474070 backupUNC030474070 backupUNC030474070    -560.
##  2 backupUNC030474244 backupUNC030474070 backupUNC030474070    -560.
##  3 UNC030103315       backupUNC030474070 backupUNC030474070    -560.
##  4 UNC030107226       backupUNC030474070 backupUNC030474070    -559.
##  5 JAX00527615        backupUNC030474070 backupUNC030474070    -558.
##  6 UNC030484421       backupUNC030474070 backupUNC030474070    -559.
##  7 JAX00109598        backupUNC030474070 backupUNC030474070    -558.
##  8 backupUNC030619800 backupUNC030474070 backupUNC030474070    -558.
##  9 UNC030314351       backupUNC030474070 backupUNC030474070    -559.
## 10 backupUNC030491543 backupUNC030474070 backupUNC030474070    -559.
## # … with 3,365 more rows

We see that out is a tibble, as expected. The first three columns contain the marker ids for each ordered triple of markers. The last column contains the log-likelihood values.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Joining, by = "marker"

Calculate the likelihood ratio test statistic for pleiotropy v separate QTL

We use the function calc_lrt_tib to calculate the likelihood ratio test statistic value for the specified traits and specified genomic region.

## [1] 0

Bootstrap analysis to get p-values

The calibration of test statistic values to get p-values uses bootstrap methods because we don’t know the theoretical distribution of the test statistic under the null hypothesis. Thus, we use a bootstrap approach to obtain an empirical distribution of test statistic values under the null hypothesis of the presence of one pleiotropic locus.

We will use the function boot_pvl from our package qtl2pleio.

We use a parametric bootstrap strategy in which we first use the studied phenotypes to infer the values of model parameters. Once we have the inferred values of the model parameters, we simulate phenotypes from the pleiotropy model (with the inferred parameter values).

A natural question that arises is “which marker’s allele probabilities do we use when simulating phenotypes?” We use the marker that, under the null hypothesis, i.e., under the pleiotropy constraint, yields the greatest value of the log-likelihood.

Before we call boot_pvl, we need to identify the index (on the chromosome under study) of the marker that maximizes the likelihood under the pleiotropy constraint. To do this, we use the qtl2pleio function find_pleio_peak_tib.

## log10lik8 
##        50

The argument nboot_per_job indicates the number of bootstrap samples that will be created and analyzed. Here, we set nboot_per_job = 10, so we expect to see returned a numeric vector of length 10, where each entry is a LRT statistic value from a distinct bootstrap sample.

Finally, we determine a bootstrap p-value in the usual method. We treat the bootstrap samples’ test statistics as an empirical distribution of the test statistic under the null hypothesis of pleiotropy. Thus, to get a p-value, we want to ask “What is the probability, under the null hypothesis, of observing a test statistic value that is at least as extreme as that which we observed?”

## [1] 0 0
## [1] 1

In practice, one would want to use many more bootstrap samples to achieve an empirical distribution that is closer to the theoretical distribution of the test statistic under the null hypothesis.

However, if one wants to perform analyses with a reasonable number - say 400 - bootstrap samples, this will take a very long time - many days - on a single laptop computer. We have used a series of computer clusters that are coordinated by the University of Wisconsin-Madison’s Center for High-throughput Computing (http://chtc.cs.wisc.edu). We typically are able to analyze 1000 bootstrap samples in less than 24 hours with this service.

Session info

## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       macOS Sierra 10.12.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  C                           
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2019-11-01                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package     * version  date       lib source                    
##  assertthat    0.2.1    2019-03-21 [3] CRAN (R 3.6.0)            
##  backports     1.1.5    2019-10-02 [3] CRAN (R 3.6.0)            
##  bit           1.1-14   2018-05-29 [3] CRAN (R 3.6.0)            
##  bit64         0.9-7    2017-05-08 [3] CRAN (R 3.6.0)            
##  blob          1.2.0    2019-07-09 [3] CRAN (R 3.6.0)            
##  callr         3.3.2    2019-09-22 [3] CRAN (R 3.6.0)            
##  cli           1.1.0    2019-03-19 [3] CRAN (R 3.6.0)            
##  colorspace    1.4-1    2019-03-18 [3] CRAN (R 3.6.0)            
##  crayon        1.3.4    2017-09-16 [3] CRAN (R 3.6.0)            
##  data.table    1.12.6   2019-10-18 [3] CRAN (R 3.6.0)            
##  DBI           1.0.0    2018-05-02 [3] CRAN (R 3.6.0)            
##  desc          1.2.0    2018-05-01 [3] CRAN (R 3.6.0)            
##  devtools      2.2.1    2019-09-24 [3] CRAN (R 3.6.0)            
##  digest        0.6.22   2019-10-21 [3] CRAN (R 3.6.1)            
##  dplyr       * 0.8.3    2019-07-04 [3] CRAN (R 3.6.0)            
##  ellipsis      0.3.0    2019-09-20 [3] CRAN (R 3.6.0)            
##  evaluate      0.14     2019-05-28 [3] CRAN (R 3.6.0)            
##  fansi         0.4.0    2018-10-05 [3] CRAN (R 3.6.0)            
##  fs            1.3.1    2019-05-06 [3] CRAN (R 3.6.0)            
##  gemma2        0.1.1    2019-10-01 [3] CRAN (R 3.6.0)            
##  ggplot2     * 3.2.1    2019-08-10 [3] CRAN (R 3.6.0)            
##  glue          1.3.1    2019-03-12 [3] CRAN (R 3.6.0)            
##  gtable        0.3.0    2019-03-25 [3] CRAN (R 3.6.0)            
##  htmltools     0.4.0    2019-10-04 [3] CRAN (R 3.6.0)            
##  jsonlite      1.6      2018-12-07 [3] CRAN (R 3.6.0)            
##  knitr         1.25     2019-09-18 [3] CRAN (R 3.6.0)            
##  labeling      0.3      2014-08-23 [3] CRAN (R 3.6.0)            
##  lattice       0.20-38  2018-11-04 [3] CRAN (R 3.6.1)            
##  lazyeval      0.2.2    2019-03-15 [3] CRAN (R 3.6.0)            
##  magrittr      1.5      2014-11-22 [3] CRAN (R 3.6.0)            
##  MASS          7.3-51.4 2019-03-31 [3] CRAN (R 3.6.1)            
##  Matrix        1.2-17   2019-03-22 [3] CRAN (R 3.6.1)            
##  memoise       1.1.0    2017-04-21 [3] CRAN (R 3.6.0)            
##  munsell       0.5.0    2018-06-12 [3] CRAN (R 3.6.0)            
##  pillar        1.4.2    2019-06-29 [3] CRAN (R 3.6.0)            
##  pkgbuild      1.0.6    2019-10-09 [3] CRAN (R 3.6.0)            
##  pkgconfig     2.0.3    2019-09-22 [3] CRAN (R 3.6.0)            
##  pkgload       1.0.2    2018-10-29 [3] CRAN (R 3.6.0)            
##  prettyunits   1.0.2    2015-07-13 [3] CRAN (R 3.6.0)            
##  processx      3.4.1    2019-07-18 [3] CRAN (R 3.6.0)            
##  ps            1.3.0    2018-12-21 [3] CRAN (R 3.6.0)            
##  purrr         0.3.3    2019-10-18 [3] CRAN (R 3.6.1)            
##  qtl2        * 0.21-14  2019-10-17 [3] Github (rqtl/qtl2@c0111ba)
##  qtl2pleio   * 1.2.3    2019-11-01 [1] local                     
##  R6            2.4.0    2019-02-14 [3] CRAN (R 3.6.0)            
##  Rcpp          1.0.2    2019-07-25 [3] CRAN (R 3.6.0)            
##  remotes       2.1.0    2019-06-24 [3] CRAN (R 3.6.0)            
##  rlang         0.4.0    2019-06-25 [3] CRAN (R 3.6.0)            
##  rmarkdown     1.16     2019-10-01 [3] CRAN (R 3.6.0)            
##  rprojroot     1.3-2    2018-01-03 [3] CRAN (R 3.6.0)            
##  RSQLite       2.1.2    2019-07-24 [3] CRAN (R 3.6.0)            
##  scales        1.0.0    2018-08-09 [3] CRAN (R 3.6.0)            
##  sessioninfo   1.1.1    2018-11-05 [3] CRAN (R 3.6.0)            
##  stringi       1.4.3    2019-03-12 [3] CRAN (R 3.6.0)            
##  stringr       1.4.0    2019-02-10 [3] CRAN (R 3.6.0)            
##  testthat      2.2.1    2019-07-25 [3] CRAN (R 3.6.0)            
##  tibble        2.1.3    2019-06-06 [3] CRAN (R 3.6.0)            
##  tidyselect    0.2.5    2018-10-11 [3] CRAN (R 3.6.0)            
##  usethis       1.5.1    2019-07-04 [3] CRAN (R 3.6.0)            
##  utf8          1.1.4    2018-05-24 [3] CRAN (R 3.6.0)            
##  vctrs         0.2.0    2019-07-05 [3] CRAN (R 3.6.0)            
##  withr         2.1.2    2018-03-15 [3] CRAN (R 3.6.0)            
##  xfun          0.10     2019-10-01 [3] CRAN (R 3.6.0)            
##  yaml          2.2.0    2018-07-25 [3] CRAN (R 3.6.0)            
##  zeallot       0.1.0    2018-01-28 [3] CRAN (R 3.6.0)            
## 
## [1] /private/var/folders/zk/7w7wqy6x2832p1mqb40w1vmw0000gn/T/Rtmp42x75s/Rinst26a07c182339
## [2] /private/var/folders/zk/7w7wqy6x2832p1mqb40w1vmw0000gn/T/RtmpspduLp/temp_libpath42c5ff0c794
## [3] /Library/Frameworks/R.framework/Versions/3.6/Resources/library