last updated: 2021-11-02

Regression

The t-test and t-distribution are widely considered to be at the very foundation of statistics. Who would believe they were invented to make great beer better?

-The t-test invented by William Gosset (‘Student’), Guiness Brewery

T-test

 

The t-test is a foundational tool for scientists

  • Compare mean differences (2 sample)

  • 1-sample difference

  • Paired sample differences…

What you will learn

 

  • The question of the t-test
  • Data and assumptions
  • Graphing
  • Tests and alternatives
  • Practice exercises

The question of the t-test

“2 sample test”

The main question is did these 2 samples come from populations with different means?

The question of the t-test

“1 sample test”

The main question is did this 1 sample come from population of a known mean?

The question of the t-test

“paired sample test”

Is there a consistent difference between paired sample observations?

Data and assumptions

 

Formal assumptions

  • Gaussian residuals (for EACH SAMPLE)

  • Heteroscedasticity

  • Independence of observations

Data and assumptions

Informal assumptions (the ones we have responsibility to evaluate)

  • Gaussian distribution (for EACH SAMPLE)

  • Heteroscedasticity (in practice we account for this difference with math by using the pooled SD)

  • Independence of observations (if this is not true, perhaps paired samples is appropriate)

Data and assumptions

 

Example of mean human height by sex

Data and assumptions

Iris data

data(iris)
names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

Data and assumptions

iris2 <- iris[1:100, c(1,5)]
iris2$Species <-droplevels(iris2$Species)
boxplot(Sepal.Length~Species, data = iris2)
stripchart(Sepal.Length~Species, data = iris2,
           pch = 16, col = 'red', vertical = T,
           add = T, method = 'jitter')

Data and assumptions

hist(iris2$Sepal.Length,
     main = 'wrong way to examine distribution')

Data and assumptions

par(mfrow = c(2,1))

hist(iris2$Sepal.Length[1:50], xlim = c(4,7), main = 'setosa')
hist(iris2$Sepal.Length[51:100], xlim = c(4,7), main = 'versicolor')

Data and assumptions

Data and assumptions

Slice out perch

Live coding