Data cleaning
Data manipulation¶
Summarizing dataset¶
Listing variables¶
Printing¶
Logical expressions¶
Filtering¶
Keeping/Dropping variables¶
Grouping variables¶
Summarizing data¶
Working with Variables¶
Numerical variables¶
String variables¶
activities <- c("running", "dancing", "reading")
pattern <- "read"
str_subset(activities, pattern) # return strings that match this pattern
str_detect(activities, pattern) # return a logical vector
str_which(activities, pattern) # return indice(s)
string <- "Contact: string@gmail.com or character@gmail.com",
pattern <- "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+"
str_extract() # extract the first match
str_extract_all() # extract all matches
Data and time variables¶
Data transformations¶
Normalization¶
TBD
Winsorization¶
Creating codebooks¶
TBD
Codes and identifiers¶
Geographic identifiers¶
FIPS codes:
- Federal Information Processing System (FIPS) Codes for States and Counties
- Federal Information Processing Standard state code - Wikipedia