Sunday, May 14, 2017

A gentle introduction to Generalized Linear Models in R

What are generalized linear models?

Generalized linear models (glm) are a special form of linear models used when errors do not follow a normal distribution. In previous posts I’ve discussed linear models (lm), their use and interpretation.

To recap, lm’s model a response variable which depends on one or more independent variables
  Regular linear models have several assumptions, a really important one is the  normal distribution of errors.  Errors are the differences between the observed and predicted values of the response variable.

Let’s use an example. Let’s say you are modeling the effect of seedling density on seedling herbivory, both measured as continuous variables.
  If you assume a linear relationship between both variables, a linear model will produce a linear equation that allows us to predict how much herbivory a plant will have based on its local density. Generally such equations are presented in the form:

\[ y \sim \alpha + \beta_1 x \]

With \(\alpha\) as the intercept and \(\beta_1\) as the regression coefficient, which depicts the linear effect of density on herbivory. If \(\beta_1 > 0\) it means that an increase in density will have a positive effect (i.e., increase) herbivory. This relationship is usually displayed as a regression line relating the dependent and independent variables.

  The errors are the differences (dotted lines) between observed values (dots on previous figure) and the regression line. When both variables are normally distributed and linearly related, the distribution of the errors should follow a normal distribution with zero mean.