If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com
Symbols:
∑ = sum
Population - All of the items we are interested in. (Can be finite, such as passengers on a plane or infinite, such as all Cokes bottled in an ongoing process.) Population symbols are commonly Greek letters. (But not always)
N = size
µ = mean
σ2 = variance
σ = standard deviation
n = size
x̄ = mean
s2 = variance
s = standard deviation
The Three Main Measurements
Central Tendency:
Knowing how and when to use a measure of central tendency is a matter of understanding what it is you're trying to understand about the data. Is it of a set of unusual circumstances? Is it about compounded growth? The answers to these questions will make a difference as to what measure of central tendency you'll use.
The Six Measures of Central Tendency
Statistic
|
Formula
|
Excel Command (just in case)
| ||
Mean |
x̄ = ∑xi
------
n
(sample mean = the sum of the values divided by the number of values) | = AVERAGE(Data) | ||
Median | The middle value in a sorted array of values | =MEDIAN(Data) | ||
Mode | The most frequently occurring value | =MODE(Data) | ||
Midrange |
xmin + xmax/2
(The sum of the largest and smallest values divided by 2) | =*.5(MIN(Data)+MAX(Data)) | ||
Geometric Mean | n√(x1)(x2)(x3)… The product of all of the values rooted by the number of the values. ("Square" roots are by 2. "Cubed" roots are by 3. In this case, it's the nth root where n is the number of values in the set) | =GEOMEAN(Data) | ||
Trimmed Mean | Same as the mean, except omitting the highest and lowest k% of the data. (e.g. 5%) | TRIMMEAN(Data, Percent) |
Pros and Cons of each method
Statistic | Pro | Con |
Mean | Familiar and it uses all the information in the sample | Influenced by extreme values |
Median | Robust when extreme values exist | Ignores extremes and can be affected by gaps in the data |
Mode | Useful for attribute data or discrete data with a small range | Could be more than one mode and it's not helpful for continuous data |
Midrange | Easy to understand and calculate | Influenced by extreme values and ignores most data values |
Geometric Mean | Useful for growth rates and mitigates high extremes | Less familiar. Requires that the data are all positive values. |
Trimmed Mean | Mitigates the effect of extreme values | Excludes some data values that could be relevant. |
Skew's Effect on the Mean and Median
In a symmetric distribution, these two central tendency measures are the same. In a skewed distribution, they'll differ.
In a left-skewed distribution, in which the outliers are extremely low values, the mean will be less than the median. (The extremely low outliers are pulling the mean downward.)
In a right-skewed distribution, in which the outliers are extremely high values, the mean will be greater than the median. (The extremely high outliers are pulling the mean upward.)
In all distributions, the mode will be in the modal class (fr. Chapter 3). That is, the mode(s) will be located in the peaks in a histogram (or bar chart), because those are the most frequently occurring values.
Measures of Dispersion
Dispersion, or how values are varied and spread out in a distribution, will tell you a lot about the data set. Measures of dispersion are also helpful in telling us a lot about a particular value in a set.
Five Measures of Dispersion of a Sample
Statistic
|
Formula
|
Excel Command
|
Range (R) |
(xmax - xmin)
(the maximum value in the set minus the minimum value) | =MAX(Data)-MIN(Data) |
Sample Variance (s2) |
∑n (xi - x̄)2
i=1
--------------
n-1
First, subtract the known sample mean (x̄) from each individual value (x) and square the result. Then add up all of the [squared] differences. Divide that number by n-1 | =VAR(Data) |
Sample Standard Deviation (s) |
√ ∑n (xi - x̄)2
i=1
--------------
n-1
Divide that number by n-1 But now, also take the square root of the result. | =STDEV(Data) |
Coefficient of Variation (CV) |
100 * s/x̄
100 times the sample standard deviation divided by the sample mean. | None |
Mean Absolute Deviation (MAD) |
∑n |xi - x̄|
i=1
--------------
n
| =AVEDEV(Data) |
Pros and Cons of Various Measures of Dispersion
Statistic | Pro | Con |
Range | Easy to calculate. Easy to interpret | Sensitive to extreme data values |
Variance | Plays a key role in mathematical statistics | Less intuitive meaning |
Standard Deviation | Most commonly used measure. Expressed in same units as the data. ($, grams, etc) | Less intuitive meaning |
Coefficient of Variation | Expresses relative variation in percent so you can compare data sets with different units of measurement | Requires nonnegative data |
Mean Absolute Deviation | Easy to understand. | Lacks "nice" theoretical properties |
There's more in this chapter that he listed. I'm just out of time to input it and get it to you all in a period that would be useful to you.
You can cover the following in the Power Point Slides (online learning center):
Chebyshev's Theorem and the Empirical Rule (Said Empirical Rule was more important. They're both theories on how to ID outliers.)
Defining a Standardized Variable - Z scores. How to calculate
Percentiles, Quartiles and Box-and-Whisker Plots (including how to make and interpret Box-and-Whisker plots). Fences, unusual data values and midranges
If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com