Sampling Distributions and Estimations



If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com  




Sampling Distributions and Estimations
8.1  Sampling Variation
A statistic is a random variable from a population which happens to be included in a random sample.   Some samples reflect their population well. Others don’t. This is called “sampling variation”.  To illustrate this concept, we can draw some ranpdom sampled from a large population of GMAT scores from MBA applicants.  The population parameters are Âµ = 520.78; Ïƒ = 86.80; N = 2.637
If we draw 8 random samples from the 2,637 applicants in the population, hey might look like this:
[Chart on p. 296]

We get sample means from each of these samples, expressed as x-bar (x with a bar over it). Sampling variation is inevitable, but there is a tendency for the sample means to be close to Âµ (thepopulation mean).
The larger the sampling, the closer x-bar gets to Âµ.  This is the basis for statistical estimation.
An “estimator” is a statistic derived from a sample to infer the value of a population parameter. These are the commonly used estimators and their calculations:
Estimator
Formula
Corresponding parameter
Sample mean
x-bar = 1/n nxi where xi is the ith data value and n is the sample size
µ
Sample standard deviation
s =   (xi – x-bar)2/n-1 where xi is theith data value and n is the sample size
σ




8.2 – Estimators and Sampling Distributions
Sampling Distributions and Sampling Error
sampling distribution (of an estimator) is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken.  It has a probability distribution, mean and variance.
Sampling error refers to the difference between an estimate and a corresponding population parameter.  Ex. In the case of the population mean, the sampling error = x-bar - Âµ.
It exists because different samples will yield different values for x-bar, depending on which population items happen to be included in the sample.  Usually the parameter we are estimating is unknown, so we cannot calculate the sampling error.  We DO know that the sample mean x-bar will correctly estimate µ on average because the sample means that overestimate µ will be offset by those that underestimate µ.

Bias – The difference between the expected value and the true parameter. 
              Bias = E(X-bar) - µ
An estimator is unbiased if the E(X-bar) is the parameter being estimated.  There can still be sampling error, but an unbiased estimator neither overstates nor understates the true parameter. (x-bar, s and p are all unbiased estimators). Sampling error is random, but bias is systemic.  It’s analogous to a rifle with its sight misaligned.  All of the shots will be off.
Efficiency – refers to the variance of the estimators sampling distribution. (Not covered in class)
Consistency – A consistent estimator will converge towards the parameter being estimated as the sample size increases.
8.3 – Sample Mean and the Central Limit Theorem
X-bar is a random variable whose value will change each time we take a different sample. As long as our samples are random samples, we should feel confident that the only error we will have in our estimating process is sampling error.  The sampling error of the sample mean is called the standard error of the mean.  It is calculated this way:
              Î£x-bar = Ïƒ/n
Ex.  Suppose the avg. price, Âµ, of an MP3 player is $80.00 with a standard deviation, Ïƒ, of $10.00.  What will be the mean and standard error of x-bar from a sample of 20 MP3s?
              Âµx-bar = $80.00 – because the sample mean approximates the population mean
              Ïƒx-bar = $10.00/√20 = $2.236
We know that if the population is exactly normal, then the sample mean follows a normal distribution of any sample size.  Often, we won’t know much about the population at all, so for that, we have
The Central Limit Theorem for a Mean
If a random sample of size n is drawn from a population with a mean µ and a standard deviation σ, the distribution of the same mean x-bar approached a normal distribution with mean µ and standard deviation Ïƒx-bar = σ/√n as the sample size increases.

This allows us to approximate the shape of the sampling distribution of X even if the shape of the population distribution is not known.  There are 3 essential elements of the Central Limit Theorem:
1)     If the population is exactly normal, the sample mean has exactly a normal distribution centered at Âµ with a standard error equal to Ïƒ/n
2)     As sample size increases, the distribution of sample means narrows in on the population mean.
3)     By the Central Limit Theorem, if the sample size is large enough, the sample means will have approximately a normal distribution even if the population is not normal.
Additionally, if your population is symmetric, there is no need to ensure that your sample size is greater than 30.
Applying the Central Limit Theorem
It permits us to define an interval within which the sample means are expected to fall.
              Âµ +/- z (σ/n)
Using familiar z-scores, you can predict the range of sample means for sample size n.
Ex. 90% Interval = Âµ +/- 1.645(σ/n) or 99% interval = Âµ +/- 2.576(σ/n)
Sample Size and the Standard Error
There is a mathematical reason why as n quadruples, Ïƒx-bar is halved.
n = 4              Ïƒx-bar Ïƒ/4              Ïƒ/2
n = 16              Ïƒx-bar Ïƒ/16              Ïƒ/4
n = 64              Ïƒx-bar Ïƒ/64              Ïƒ/8
Distribution of sample means collapses at the true population mean Âµ as n increases.





If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com  




Popular Posts