End of chapter questions for elementary statistical modelling, chapter 4

C4.Q1 Fit a model that directly estimates how different each of the two non-normal horn types (scurred and polled) are from the scurred horn type.

Recall from Using R, chapter 2, that horn types come in three classes, coded by integers: scurred (small deformed horns) have a value of S, polled (horns absent) have a value of P, and normal horns get a value of N. What we need is a factor variable where the default factor level is forced to be S (recall that, unless otherwise specified, the default factor level will be the lowest alphanumeric value, so in this case N for normal. We can override this:

# make up new variable NHD for "S
# Horn Default"
unicorns$SHD<-factor(unicorns$Horn,
  levels=c("S","N","P"))

With this factor ordering, the two estimated contrasts are differences relative to normal horns:

summary(lm(BirthWt~SHD,
     data=unicorns))$coefficients
##                Estimate Std. Error    t value   Pr(>|t|)
## (Intercept)  2.21185771 0.03617225 61.1479186 0.00000000
## SHDN         0.07973749 0.04331560  1.8408491 0.06592812
## SHDP        -0.02171347 0.05385107 -0.4032134 0.68687404

As we have worked out previously, those with scurred are the babies with the intermediate mean birth mass. You can quickly see that this is represented in our results here, by the positive contrast for birth mass of normal horned baby unicorns (which are heavier at birth than scurrs), and negative for polled unicorns (which are the lightest at birth).

C4.Q2 Manually re-code horn type into numeric variables containing zeros and ones, so as to recover the same differences as in question 1. Note that this will apply the basic principle that is illustrated in chapter 3, but to a predictor variable with three levels, rather than two.

– Hint 1: you will need to create two new variables.

– Hint 2: if you want to enter another variable into a regression model formula, just combine whatever predictors you want with a plus sign (there is a lot to learn about this operation, but that you can add a new variable in with a plus sign is all you need to know for this problem).

I won’t provide inordinate description, as the solution is very much like that for one of the questions in the previous chapter.

# make indictor variables for being 
# either scurred or polled
unicorns$IsNormal<-(unicorns$Horn=="N")+0
unicorns$IsPolled<-(unicorns$Horn=="P")+0
# now fit a model with these predictors
newModel<-lm(BirthWt~IsNormal+IsPolled,data=unicorns)
# see the coefficient estimates
summary(newModel)$coefficients
##                Estimate Std. Error    t value   Pr(>|t|)
## (Intercept)  2.21185771 0.03617225 61.1479186 0.00000000
## IsNormal     0.07973749 0.04331560  1.8408491 0.06592812
## IsPolled    -0.02171347 0.05385107 -0.4032134 0.68687404

Check back to the answer for the previous question. You’ll see that these coefficient estimates match exactly.