Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But, correlation does not imply causation.
-Nate Silver
last updated: 2021-10-19
Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But, correlation does not imply causation.
-Nate Silver
Â
Â
Graph a correlation with a scatterplot
Given some data:
plot(x = veg, y = arth,
xlab = 'Vegetation biomass', ylab = 'Arth. abundance',
main = 'A positive correlation',
pch = 16, col = 'blue')
The correlation coefficient is the covariance of 2 numeric variables divided by the product of their standard deviations
-1 < r < 1
The correlation coefficient is the covariance of 2 numeric variables
# The 'hard' way
# (sample) covariance
cov_veg_arth <- sum( (veg-mean(veg))*(arth-mean(arth))) /
(length(veg) - 1 )
# r
(r_arth_veg <- cov_veg_arth / (sd(veg) * sd(arth)))
## [1] 0.6056694
The correlation coefficient is the covariance of 2 numeric variables
Assumptions:
linear relationship between variables
Gaussian distribution for each variable
The correlation coefficient is the covariance of 2 numeric variables
# The 'easy' way cor(veg, arth)
## [1] 0.6056694
A range of correlation magnitudes and signs
The pairs() function is useful for EDA
data(iris)
# pairs plot
pairs(iris[ , 1:4], pch = 16,
col = iris$Species) # Set color to species...
Â
cor.test() functioncor.test(veg, arth)
## ## Pearson's product-moment correlation ## ## data: veg and arth ## t = 7.4966, df = 97, p-value = 3.101e-11 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4637006 0.7173146 ## sample estimates: ## cor ## 0.6056694
Â
Reporting results (NEVER PASTE RAW OUTPUT)
We found a significant correlation between vegetation biomass and arthropod abundance (Pearson’s r = 0.61, df = 97, P < 0.0001)