Data cleaning
Resources¶
ยง1. Dataframe manipulation¶
Listing variables¶
Printing and viewing¶
Logical expressions¶
Filtering¶
Creating new variables¶
Keeping, dropping, and ordering columns¶
Grouping variables¶
Summarizing data¶
Joining dataframes¶
TBD
TBD
TBD
ยง2. Types of variables¶
Numerical variables¶
Rounding¶
Winsorization and trimming¶
String variables¶
activities <- c("running", "dancing", "reading")
pattern <- "read"
str_subset(activities, pattern) # return strings that match this pattern
str_detect(activities, pattern) # return a logical vector
str_which(activities, pattern) # return indice(s)
string <- "Contact: string@gmail.com or character@gmail.com",
pattern <- "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+"
str_extract() # extract the first match
str_extract_all() # extract all matches
Factor variables¶
TBD
TBD
TBD