# Using hypr for linear regression

## Background

hypr is a package for easy translation between experimental (null) hypotheses, hypothesis matrices and contrast matrices, as used for coding factor contrasts in linear regression models. The package can be used to derive contrasts from hypotheses and vice versa. The first step is to define the hypotheses. This step is independent of the package per se and requires some theoretical background knowledge in null hypothesis significance testing (NHST). This vignette shows two examples of deriving contrasts and using them for statistical analyses.

For a general introduction to hypr, see the hypr-intro vignette:

vignette("hypr-intro", package = "hypr")

## Simulated dataset

For the examples in this vignette, we are using a simulated dataset with one factor X with four levels X1, X2, X3, and X4:

set.seed(123)
M <- c(mu1 = 10, mu2 = 20, mu3 = 10, mu4 = 40) # condition means
N <- 5
SD <- 10
simdat <- do.call(rbind, lapply(names(M), function(x) {
data.frame(X = x, DV = as.numeric(MASS::mvrnorm(N, unname(M[x]), SD^2, empirical = TRUE)))
}))
simdat$X <- factor(simdat$X)
contrasts(simdatX) ## [,1] [,2] [,3] ## mu1 0 0 0 ## mu2 1 0 0 ## mu3 0 1 0 ## mu4 0 0 1 round(coef(summary(lm(DV ~ X, data=simdat))), 3) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10 4.472 2.236 0.040 ## X1 10 6.325 1.581 0.133 ## X2 0 6.325 0.000 1.000 ## X3 30 6.325 4.743 0.000 The linear regression returns the expected estimates: The intercept is the baseline condition and the three main effects are the differences between the baseline and the three conditions. ## Example: Sum contrast coding A sum contrast, such as used for ANOVA, with four levels could generate the following null hypotheses: \begin{align} H_{0_1}:& \; \mu_1 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_2}:& \; \mu_2 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_3}:& \; \mu_3 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \end{align} We rewrite them into hypr: sumC <- hypr(mu1 ~ (mu1+mu2+mu3+mu4)/4, mu2 ~ (mu1+mu2+mu3+mu4)/4, mu3 ~ (mu1+mu2+mu3+mu4)/4) sumC ## hypr object containing 3 null hypotheses: ## H0.1: 0 = 3/4*mu1 - 1/4*mu2 - 1/4*mu3 - 1/4*mu4 ## H0.2: 0 = 3/4*mu2 - 1/4*mu1 - 1/4*mu3 - 1/4*mu4 ## H0.3: 0 = 3/4*mu3 - 1/4*mu1 - 1/4*mu2 - 1/4*mu4 ## ## Hypothesis matrix (transposed): ## [,1] [,2] [,3] ## mu1 3/4 -1/4 -1/4 ## mu2 -1/4 3/4 -1/4 ## mu3 -1/4 -1/4 3/4 ## mu4 -1/4 -1/4 -1/4 ## ## Contrast matrix: ## [,1] [,2] [,3] ## mu1 1 0 0 ## mu2 0 1 0 ## mu3 0 0 1 ## mu4 -1 -1 -1 We next assign the contrast matrix to the factor X: contrasts(simdatX) <- contr.hypothesis(sumC)
contrasts(simdat$X) ## [,1] [,2] [,3] ## mu1 1 0 0 ## mu2 0 1 0 ## mu3 0 0 1 ## mu4 -1 -1 -1 Without creating the intermediate hypr object, you can also set the contrasts directly like this: contrasts(simdat$X) <- contr.hypothesis(
mu1 ~ (mu1+mu2+mu3+mu4)/4,
mu2 ~ (mu1+mu2+mu3+mu4)/4,
mu3 ~ (mu1+mu2+mu3+mu4)/4
)
contrasts(simdat\$X)
##     [,1] [,2] [,3]
## mu1    1    0    0
## mu2    0    1    0
## mu3    0    0    1
## mu4   -1   -1   -1

Finally, we run the linear regression:

round(coef(summary(lm(DV ~ X, data=simdat))),3)
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)       20      2.236   8.944     0.00
## X1               -10      3.873  -2.582     0.02
## X2                 0      3.873   0.000     1.00
## X3               -10      3.873  -2.582     0.02