Skip to content

Data description

Resources

Introduction

Useful Packages

Sample description

TBD

Summary of variables

All variables

Using skimr

skimr::skim()

Using summarytools

data_summary <- summarytools::dfSummary(
    dataset_xxx,
    varnumbers = TRUE,
    labels.col = if_label,
    graph.magnif = 1, 
    valid.col = FALSE,
    na.col = TRUE,
    style = "grid", 
    plain.ascii = FALSE,
    max.string.width = 25,
    split.table = 30,
    tmp.img.dir = "/tmp"
)
summarytools::view(
    data_summary,
    footnote = NA,
    file = file.path(path_html, paste0(name, "_summary.html"))
)

Continuous variables

hist(vector_name)
su <var>
su <var>, d // equivalent to `summarize, detail`
codebook <var>
histogram <var>

Categorical variables

table()
dplyr::count(col_name)
janitor::tabyl(col_name)

Missing values

finalfit::missing_plot()
misstable

Correlation analysis

TBD

  • Pearson's correlation coefficient (\(\rho\))

  • Rank correlation coefficients

Correlation coefficients

Visualization

Patterns across dimensions

TBD

Subgroup comparisons

Summary across waves

Case studies

TBD