Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But, correlation does not imply causation.
-Nate Silver
last updated: 2021-10-19
Ice cream sales and forest fires are correlated because both occur more often in the summer heat. But, correlation does not imply causation.
-Nate Silver
Â
Â
Graph a correlation with a scatterplot
Given some data:
plot(x = veg, y = arth, xlab = 'Vegetation biomass', ylab = 'Arth. abundance', main = 'A positive correlation', pch = 16, col = 'blue')
The correlation coefficient is the covariance of 2 numeric variables divided by the product of their standard deviations
-1 < r < 1
The correlation coefficient is the covariance of 2 numeric variables
# The 'hard' way # (sample) covariance cov_veg_arth <- sum( (veg-mean(veg))*(arth-mean(arth))) / (length(veg) - 1 ) # r (r_arth_veg <- cov_veg_arth / (sd(veg) * sd(arth)))
## [1] 0.6056694
The correlation coefficient is the covariance of 2 numeric variables
Assumptions:
linear relationship between variables
Gaussian distribution for each variable
The correlation coefficient is the covariance of 2 numeric variables
# The 'easy' way cor(veg, arth)
## [1] 0.6056694
A range of correlation magnitudes and signs
The pairs()
function is useful for EDA
data(iris) # pairs plot pairs(iris[ , 1:4], pch = 16, col = iris$Species) # Set color to species...
Â
cor.test()
functioncor.test(veg, arth)
## ## Pearson's product-moment correlation ## ## data: veg and arth ## t = 7.4966, df = 97, p-value = 3.101e-11 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.4637006 0.7173146 ## sample estimates: ## cor ## 0.6056694
Â
Reporting results (NEVER PASTE RAW OUTPUT)
We found a significant correlation between vegetation biomass and arthropod abundance (Pearson’s r = 0.61, df = 97, P < 0.0001)