End of chapter questions for elementary statistical modelling, chapter 1

C1.Q1 Use the function lm() to model the mean number of fledglings produced by blue tits. What is the estimate of the mean? What is the standard error of the mean?

First use the function to model the data as an intercept-only model. Then, view the table generated using the function:

fledglings<-c(4,3,0,6,3,0,2)
fledge_mod<-lm(fledglings~1)
summary(fledge_mod)$coefficients
##             Estimate Std. Error  t value   Pr(>|t|)
## (Intercept) 2.571429  0.8123201 3.165536 0.01942827

The mean and the standard error of the mean are the first two entries in the table. The mean is 2.57 and its standard error is 0.81.

C1.Q2 What is the standard deviation of fledgling reproductive success?

Don’t over-think:

sd(fledglings)
## [1] 2.149197

C1.Q3 Calculate the standard error of mean reproductive success, without relying on the lm() and summary() functions to do it for you.

This requires using the equation for the standard error:

# mean
mean(fledglings)
## [1] 2.571429
# standard error of the mean
sd(fledglings)/sqrt(length(fledglings))
## [1] 0.8123201

Do help(length) if you don’t see how the denominator gives the square root of the sample size.

C1.Q4 Make a sensible statement about the range of plausible values for the mean number of fledglings produced by blue tits in my area.

The question is really asking you to generate a confidence interval:

# make a variable for our SE for convenience
se<-sd(fledglings)/sqrt(length(fledglings))
# 95% confidence interval
mean(fledglings)+c(-1,+1)*1.96*se
## [1] 0.9792812 4.1635760

We can therefore say (roughly), with 95% confidence that a reasonable range for the mean number of blue tit fledglings produced from a nest, in the conditions experienced by the species around my house, is between 0.98 and 4.16.

C1.Q5 It has been reported from a very large study of tens of thousands of nests that the mean number of fledglings produced by blue tits is 5.22. What can you say about the null hypothesis that fledglings in the general area around my house have average fledgling production for the species?

By default, the p value for the null hypothesis test given by summary() relates to a null hypothesis where the intercept is zero. Given the huge supposed sample size that led to the value of 5.22 for the species, we can treat it as an exactly known fact, because the statistical noise in this estimate will be tiny. We can co-opt this to test for a difference from 5.22 by subtracting this value from the data.

# deviations of the data from the null hypothesis
fledglings_null_hyp_deviations<-fledglings-5.22
# now model these deviations
new_mod<-lm(fledglings_null_hyp_deviations~1)
summary(lm(new_mod))$coefficients
##              Estimate Std. Error   t value   Pr(>|t|)
## (Intercept) -2.648571  0.8123201 -3.260502 0.01723796

So, the probability of observing a deviation from the species average as big (or bigger) as we did, if in fact the blue tits around my house were perfectly average for the species is 0.017. By convention, we would say that we reject this null hypothesis.

C1.Q6 How strong do you think your result for the previous question is in relation to the null hypothesis that the fledgling production of blue tits around my house is average for the species?

This evidence is very weak – or at least the scope for interpreting anything biological from these data is minimal. You should be aware of this, regardless of whether you got the answer right for the previous question. The data are extremely few. You could argue that the blue tits around my house are statistically significantly less productive than the species average. There is truth to this, but it could easily just reflect the fact that my neighbors have a cat. That’s what I mean by (at best) there being very little scope for any kind of meaningful interpretation of the data. Statistical results are one thing, but thinking about where the data come from in the first place matters too.