last update: 2021-08-04

R Space

 

Data should be tidy with mis en place

What you will learn

 

  • “Tidy Data” concept
  • Common data file types
  • Excel, data setup, and the Data Dictionary
  • Getting data into R
  • Manipulating variables
  • Practice exercises

“Tidy Data” concept

 

“Tidy Data” concept

 

The data should be in a standard, accessible, reproducible, non-haphazard, format

  • One column per variable

  • One row per observational unit

  • Thought given to sympathetic data “codification”

  • explanation of variables

  • Variable naming conventions

Common data file types

 

Delimited text files

  • .csv, .txt, tab-delimited, others

  • first row with variable names

  • problem with explanation of variables

 

Excel is okay, but must be Tidy

Avoid proprietary formats (like SPSS, Genstat, etc.)

Data formats

Untidy data in Excel

Untidy data in Excel

Data formats

Tidy data in Excel

Tidy data in Excel

Data formats

Data dictionary

Data dictionary

Data formats

Tidy csv

Tidy csv

Getting data into R

 

Step 1 Set your working directory with code, or RStudio menus

Working directory menus

Working directory menus

Getting data into R

 

Different data file types require different functions

 

for .csv read.csv()

for other delimited read.table()

for Excel library(openxlsx); read.xlsx()

many others exist

Manipulating variables

 

  • names() function

  • The use of the $ operator for data frames

  • The use of the str() function for data frames

  • The use of the index operator [ , ]

  • The use of the attach() function

Live coding

Â