Statistics

This page reviews concepts from statistics that are useful for econometrics.

Related pages: Probability

§1. Sampling¶

Calibration¶

(Sample reweighting)

Methods

Post-stratification
Raking
- See: Iterative proportional fitting - Wikipedia
Entropy-balancing

RStata

Package survey:

R: Raking of replicate weight design

Package ebal:

R: Entropy balancing

ebalance: A Stata Package for Entropy Balancing | Journal of Statistical Software

§2. Statistics¶

Statistics for random scalars¶

Let \(X^n = (X_i)_{i=1}^n\) denote an i.i.d. random sample of size \(n\).

Sample mean (an unbiased estimator of true mean):

\[\begin{equation} \overline{X^n} = \frac{1}{n}\sum_{i=1}^n X_i \end{equation}\]

Sample variance (an unbiased estimator of true variance):

\[\begin{equation} \widehat{Var}(X^n) = \frac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X^n })^2 \end{equation}\]

Sample standard deviation: \begin{equation} \widehat{SD}(X^n) = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X^n })^2} \end{equation}

Sampling distribution of statistics¶

Sampling variance of the sample mean: \begin{align} \operatorname{Var}(\overline{X^n}) & = \operatorname{Var}( \frac{1}{n}\sum_{i=1}^{n}X_i ) \nonumber \\ & = \frac{1}{n^2} \operatorname{Var}(\sum_{i=1}^{n}X_i) \nonumber \\ & = \frac{1}{n^2} \sum_{i=1}^{n}\operatorname{Var}(X_i) \nonumber \\ & = \frac{n \sigma^2 }{n^2} \nonumber \\ & = \frac{ \sigma^2 }{n} \end{align}

where \(\sigma^2\) is the true variance of \(X_i\).

Standard error of the sample mean: \begin{equation} \operatorname{SE}(\overline{X^n}) = \frac{\sigma}{\sqrt{n}} \end{equation}

Statistics for a weighted sample¶

Let \(X_w^n\) denote an i.i.d. random sample \((X_i)_{i=1}^n\) with analytical weights \((w_i)_{i=1}^n\).

Weighted sample mean: \begin{equation} \overline{X_w^n } = \frac{ \sum_{i=1}^n w_i X_i }{ \sum_{i=1}^n w_i } \end{equation}

Weighted sample variance: \begin{equation} \widehat{\operatorname{Var}}(X_w^n) = \frac{ \sum_{i=1}^n w_i (X_i-\overline{X_w^n })^2 }{ \sum_{i=1}^n w_i - \frac{\sum_{i=1}^n w_i^2 }{\sum_{i=1}^n w_i} } \end{equation}

Sampling variance of the weighted sample mean: \begin{align} \operatorname{Var}(\overline{X_w^n}) &= \operatorname{Var}(\frac{ \sum_{i=1}^n w_i X_i }{ \sum_{i=1}^n w_i }) \nonumber \\ &= \frac{1}{(\sum_{i=1}^n w_i)^2} \operatorname{Var}(\sum_{i=1}^{n}w_iX_i) \nonumber \\ &= \frac{1}{(\sum_{i=1}^n w_i)^2} \sum_{i=1}^{n} w_i^2 \operatorname{Var}(X_i) \nonumber \\ &= \frac{\sum_{i=1}^n w_i^2 }{(\sum_{i=1}^n w_i)^2} \sigma^2 \nonumber \\ &= \frac{ \sigma^2 }{n_\text{eff}} \end{align}

where the effective sample size \(n_\text{eff}\) follows:

\[\begin{equation*} n_\text{eff} = \frac{(\sum_{i=1}^n w_i)^2}{\sum_{i=1}^n w_i^2 } \end{equation*}\]

Statistics for random vectors¶

Let \((X_i)_{i=1}^n\) be i.i.d. random vectors in \(\mathbb{R}^k\). Stack the observations row-wise into the data matrix

\[\begin{equation} X = \begin{pmatrix} X_1^\top \\ X_2^\top \\ \vdots \\ X_n^\top \end{pmatrix} \in \mathbb{R}^{n \times k}. \end{equation}\]

Sample variance-covariance matrix estimator:

\[\begin{equation} \hat{\Sigma} = \frac{1}{n} X^\top X - \bar X \bar X^\top = \frac{1}{n} \sum_{i=1}^n (X_i - \bar X)(X_i - \bar X)^\top. \end{equation}\]

§3. Basic asymptotics¶

Convergence in probability¶

TBD

Law of Large Numbers (LLN)¶

Law of Large Numbers

Convergence in distribution¶

TBD

Central Limit Theory (CLT)¶

Central Limit Theorem

§4. Parameter estimation¶

Unbiased estimators¶

TBD

Consistent estimators¶

TBD

Confidence intervals¶

§5. Hypothesis testing¶

Define a hypothesis.
Select a test statistic \(T\).
Derive the distribution of the test statistic \(T\) under the null hypothesis (e.g., a t distribution with known degrees of freedom; a normal distribution with know mean and variance).
Select a significance level \(\alpha\) (e.g., 5%).
Compute from the observations the observed value \(t\) of the test statistic T.
Decide to either reject the null hypothesis in favor of the alternative or not reject it.

Null and alternative hypotheses¶

\(H_0\) and \(H_1\)

Significance levels¶

TBD

P-value¶

p-value - Wikipedia

Type I and type II errors¶

False positive and false negative.

§6. Order statistics¶

TBD