R Bootcamp 2.2 Sampling & Distributions

last updated: 2021-10-11

Sampling and distributions

A curve has been found representing the frequency distribution of standard deviations of samples drawn from a normal population.

-Gosset. 1908, Biometrika 6:25.

What you will learn

Use of the histogram
Gaussian ain’t normal
Poisson
Binomial
Diagnosing the distribution
Practice exercises

Use of the histogram

# Let's simulate some fake weight data for 10,000 cats
set.seed(42)
cats <- rnorm(n = 10000, mean = 4, sd = 0.5)

Use of the histogram

Bars are counts of observations
‘Bins’ non-overlapping
Shape is diagnostic

Gaussian ain’t normal

Things you can measure with continuous precision

The Gaussian is sometimes referred to as the ‘normal’ distribution
Implies it is typical
The Gaussian ain’t necessarily typical!
Described by mean and std. dev. (the Gaussian parameters)

Gaussian ain’t normal

Mean

# Data
myvar <- c(1,4,8,3,5,3,8,4,5,6)

# Mean the "hard" way
(myvar.mean <- sum(myvar)/length(myvar))

## [1] 4.7

# Mean the easy way
mean(myvar)

## [1] 4.7

Gaussian ain’t normal

Standard Deviation

# (NB this is the sample variance with [n-1])
(sum((myvar-myvar.mean)^2 / (length(myvar)-1)))

## [1] 4.9

# Variance the easy way 
var(myvar)

## [1] 4.9

# Std dev the easy way
sqrt(var(myvar))

## [1] 2.213594

Gaussian ain’t normal

Poisson

Counts of rare events (like deaths from being kicked by a horse in the Prussian army…)

Usually low mean value
Described by a single parameter \(\lambda\)
\(\lambda\) is both the mean and std. dev

Poisson

set.seed(42)
mypois <- rpois(n = 100, lambda = 3)
hist(mypois,
     main = "Ewes with triplets",
     xlab = "Count of Triplets")

Poisson

Binomial

Counts of events with exactly two outcomes, one of which might be a “success” (like ‘deaths from being kicked by a horse in the Prussian army…’heads’ or ‘tails’, live or die, disease or healthy, etc.)

Sometimes we are interested in the probability of success
Described by 2 parameters, p{success}, and the number of trials

Binomial

Diagnosing the distribution

A very common task faced when handling data is “diagnosing the distribution”. Just like a human doctor diagnosing an ailment, you examine the evidence, consider the alternatives, judge the context, and take a guess.

Diagnosing the distribution

Expectation based on the type of data
Graph the data and look
compare expected theoretical dist. with several known dist’s
try transformation (e.g. to ‘coerce’ to Gaussian)

Sampling and distributions

What you will learn

Use of the histogram

Use of the histogram

Gaussian ain’t normal

Gaussian ain’t normal

Gaussian ain’t normal

Gaussian ain’t normal

Poisson

Poisson

Poisson

Binomial

Binomial

Diagnosing the distribution

Diagnosing the distribution

Live coding