Statistics

Statistics studies how to infer properties of the distribution underlying the data we observe.

We assume a sample of observation - a subset of a population of interest for the study.

We focus on parametric statistics, where the population's data follow some distribution (e.g. ). Our goal is to estimate the model parameters from the sample of observed data. Our sample will be affected by uncertainty due to the limited sample size.

1. Estimation Theory

A sample of data may be seen as a realization of a set of random variabbles . We assuem that a single draw follows a distribution , where are the parameters we wish to estimate.

We will assume that our data point random variables are independent and identically distributed (i.i.d.).

We also assume that sampling is carried out with replacement, so that the are independent.

1.1 Statistic

A statistic is a function of the random sample , and is itself a random variable:

1.2 Bias

We define the bias of an estimator for parameter as :

1.2.1 Unbiased Sample Variance

How can we obtain an unbiased estimator for the variance?

We may be tempted to consider , but this can be shown to be biased.

To fix this, we must use Bessel's Correction, such that :

Why?

First, we must show that:

Then, we get:

where we have used that .

This implies that .

1.3 Efficiency

Suppose we have to unbiased estimators for , and . If we know and , we can compare the two estimators based on their variance, i.e. is more efficient than if:

If is more efficient than any other possible estimator, then we say that is efficient.

1.4 Consistency

Consistency allows us to recognize behaviours such as improvement on large numbers. We say is consistent for parameter if:

If is unbiased and , then is consistent.

2. Maximum Likelihood

  1. The likelihood function is the product of the pmf/pdf veiwed as a function of a parameter .
  2. We can find the log-likelihood function as .
  3. We can find a that maximizes the log-likelihood function by finding that solves .
  4. If corresponds to a maximum: , then is a maximum likelihood estimator (MLE) of .

The maximum likelihood estimator states that the best estimate for the parameter is the one that maximizes the likelihood of the observed data.

In some cases, we may only know the statistics of a distribution. Now, the MLE is non viable but w ecan still estimate parameters with enough sample moments. We do this with moment matching.

3. Central Limit Theorem

The central limit theorem (CLT) is a general result for sums of random variables. Let be independent and identically distributed random variables from any probability distribution with finite mean and finite variance . Then:

From this, we have the celebrated result:

Which can also be written as:

Where is the sample mean.

3.1 Sample Mean Applications

The CLT implies for large that , which is a powerful result:

3.2 CLT Proof

Given independent and identically distributed random variables , we can standardize their sum to get:

where is a shifted versio of with a mean. Then, the moment generating function of is:

where is the mgf of (and each ). We can now expand using Taylor's Theorem:

where we note that and . Substituting this into , we get:

But, is the mgf of a standard normal distribution, so .

4. Hypothesis Testing

Consider the hypothesis that a parameter takes a specific value . We can test this hypothesis with a two sided test:

Alternatively, we can use a one sided test:

To test the validity of , we choose a test statistic of the data for which we can find the distribution under . We must define the test by identifying a rejection region of low probability values of under teh assumption that is true, such that:

For some small probability , the significance level. Then, for a sample we calculate the observed , for which:

4.1 Testing Population Mean

Suppose are independent and identically distributed random variables from a normal distribution , where is known. We may wish to test the hypothesis against . We can use the test statistic. Then, under , we know both and . So, for sample mean , we can standardize to get:

Where is the standard error of the mean. By CLT, the result also holds when are not normally distributed.

If the takes extreme values far from zero, we reject the null hypothesis. So, we define our rejection region as the tails of the standard normal distribution:

4.2 Unknown Variance

We may not know the real variance , but we do know the (bias-corrected) sample variance . In this case, we can use the t-distribution with degrees of freedom. This is given as a table:

and so on...

If is known then we have:

Which uses the distribution with degrees of freedom and bias corrected sample standard deviation . So, the rejection region is:

4.3 P-Value

It is important to quantify the statistical significance of a result in addition to giving a reject / retain outcome. The p-value of the data is the probability that obtaining a test statistic is at least as extreme as the one observed, assuming is true. This is the maximum significance level at which we still reject .

Thus, if we are given a fixed , then is rejected if .

Back to Home