MODEL FORMULAE
This is a short tutorial on writing model formulae for ANOVA and regression analyses. It will be linked to from those tutorials, but you are welcome to read it just for kicks if you'd like.
R functions such as aov( ), lm( ), and glm( ) use a formula interface to specify the variables to be included in the analysis. The formula determines the model that will be built (and tested) by the R procedure. The basic format of such a formula is...
response variable ~ explanatory variables
The tilde should be read "is modeled by" or "is modeled as a function of." The trick is in how the explanatory variables are given.
A basis regression analysis would be formulated this way...
y ~ x
...where "x" is the explanatory variable or IV, and "y" is the response variable or DV. Additional explanatory variables would be added in as follows...y ~ x + z
...which would make this a multiple regression with two predictors. This raises a critical issue that must be understood to get model formulae correct. Symbols used as mathematical operators in other contexts do not have their usual mathematical meaning inside model formulae. The following table lists the meaning of these symbols when used in a formula.symbol | example | meaning |
---|---|---|
+ | + x | include this variable |
- | - x | delete this variable |
: | x : z | include the interaction between these variables |
* | x * z | include these variables and the interactions between them |
/ | x / z | nesting: include z nested within x |
| | x | z | conditioning: include x given z |
^ | (u + v + w)^3 | include these variables and all interactions up to three way |
poly | poly(x,3) | polynomial regression: orthogonal polynomials |
Error | Error(a/b) | specify the error term |
I | I(x*z) | as is: include a new variable consisting of these variables multiplied |
1 | - 1 | intercept: delete the intercept (regress through the origin) |
You may have noticed already that some formula structures can be specified in more than one way...
y ~ u + v + w + u:v + u:w + v:w + u:v:w
y ~ u * v * w
y ~ (u + v + w)^3
All three of these specify a model in which the variables "u", "v", "w", and all the interactions between them are included. Any of these formats...y ~ u + v + w + u:v + u:w + v:w
y ~ u * v * w - u:v:w
y ~ (u + v + w)^2
...would delete the three way interaction.
The nature of the variables--binary, categorial (factors), numerical--will determine the nature of the analysis. For example, if "u" and "v" are factors...
y ~ u + v
...dictates an analysis of variance (without the interaction term). If "u" and "v" are numerical, the same formula would dictate a multiple regression. If "u" is numerical and "v" is a factor, then an analysis of covariance is dictated.
That ought to do if for now. Specific examples will appear in the tutorials devoted to specific analyses.
No comments:
Post a Comment
Thank you