More Statistics



If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com  


Symbols:
 
= sum
 
Population - All of the items we are interested in. (Can be finite, such as passengers on a plane or infinite, such as all Cokes bottled in an ongoing process.)  Population symbols are commonly Greek letters. (But not always)
 

N = size
µ = mean
σ2 = variance
σ = standard deviation

Sample - A subset of the population that we will actually be analyzing. Sample symbols are commonly Roman letters. (But not always)


n = size
x̄  = mean
s2 = variance
s = standard deviation

The Three Main Measurements
 
Central Tendency:
 
Knowing how and when to use a measure of central tendency is a matter of understanding what it is you're trying to understand about the data.  Is it of a set of unusual circumstances? Is it about compounded growth? The answers to these questions will make a difference as to what measure of central tendency you'll use.
 
The Six Measures of Central Tendency
 
Statistic
Formula
Excel Command (just in case)
Mean
x̄ = ∑xi
       ------
     n

(sample mean = the sum of the values divided by the number of values)
= AVERAGE(Data)
MedianThe middle value in a sorted array of values=MEDIAN(Data)
ModeThe most frequently occurring value=MODE(Data)
Midrange
xmin + xmax/2  

(The sum of the largest and smallest values divided by 2)
=*.5(MIN(Data)+MAX(Data))
Geometric Meann√(x1)(x2)(x3)…

The product of all of the values rooted by the number of the values. ("Square" roots are by 2. "Cubed" roots are by 3. In this case, it's the nth root where n is the number of values in the set)
=GEOMEAN(Data)
Trimmed MeanSame as the mean, except omitting the highest and lowest k% of the data. (e.g. 5%)TRIMMEAN(Data, Percent)

 
 
Pros and Cons of each method
 
StatisticProCon
MeanFamiliar and it uses all the information in the sampleInfluenced by extreme values
MedianRobust when extreme values existIgnores extremes and can be affected by gaps in the data
ModeUseful for attribute data or discrete data with a small rangeCould be more than one mode and it's not helpful for continuous data
MidrangeEasy to understand and calculateInfluenced by extreme values and ignores most data values
Geometric MeanUseful for growth rates and mitigates high extremesLess familiar. Requires that the data are all positive values.
Trimmed MeanMitigates the effect of extreme valuesExcludes some data values that could be relevant.

 
Skew's Effect on the Mean and Median
 
In a symmetric distribution, these two central tendency measures are the same.   In a skewed distribution, they'll differ.
 
In a left-skewed distribution, in which the outliers are extremely low values, the mean will be less than the median.  (The extremely low outliers are pulling the mean downward.)
 
In a right-skewed distribution, in which the outliers are extremely high values, the mean will be greater than the median. (The extremely high outliers are pulling the mean upward.)
 
In all distributions, the mode will be in the modal class (fr. Chapter 3).  That is, the mode(s) will be located in the peaks in a histogram (or bar chart), because those are the most frequently occurring values.
 
Measures of Dispersion
Dispersion, or how values are varied and spread out in a distribution, will tell you a lot about the data set.  Measures of dispersion are also helpful in telling us a lot about a particular value in a set.
 
Five Measures of Dispersion of a Sample
 
Statistic
Formula
Excel Command
Range (R)
(xmax - xmin)
(the maximum value in the set minus the minimum value)
=MAX(Data)-MIN(Data)
Sample Variance (s2)

n (xi - x̄)2
                                            i=1
--------------
n-1

First, subtract the known sample mean (x̄) from each individual value (x) and square the result.  Then add up all of the [squared] differences.

Divide that number by n-1
=VAR(Data)
Sample Standard Deviation (s)
n (xi - x̄)2
                                               i=1
--------------
n-1

Just like in Sample Variance, first, subtract the known sample mean (x̄) from each individual value (x) and square the result.  Then add up all of the [squared] differences.

Divide that number by n-1

But now, also take the square root of the result.
=STDEV(Data)
Coefficient of Variation (CV)
100 * s/

100 times the sample standard deviation divided by the sample mean.
None
Mean Absolute Deviation (MAD)
n |xi - x̄|
                                          i=1
--------------
n

Just like sample variance, but without needing to square the result.  The absolute value function will make all differences positive, so there won't be any cancellation of values when the sum is taken.
=AVEDEV(Data)

 
Pros and Cons of Various Measures of Dispersion
 
StatisticProCon
RangeEasy to calculate. Easy to interpretSensitive to extreme data values
VariancePlays a key role in mathematical statisticsLess intuitive meaning
Standard DeviationMost commonly used measure.
Expressed in same units as the data. ($, grams, etc)
Less intuitive meaning
Coefficient of VariationExpresses relative variation in percent so you can compare data sets with different units of measurementRequires nonnegative data
Mean Absolute DeviationEasy to understand.Lacks "nice" theoretical properties

 
There's more in this chapter that he listed.  I'm just out of time to input it and get it to you all in a period that would be useful to you.  
 
You can cover the following in the Power Point Slides (online learning center):
 
Chebyshev's Theorem and the Empirical Rule (Said Empirical Rule was more important. They're both theories on how to ID outliers.)
 
Defining a Standardized Variable - Z scores.  How to calculate
 
Percentiles, Quartiles and Box-and-Whisker Plots (including how to make and interpret Box-and-Whisker plots).  Fences, unusual data values and
midranges


If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com  

Popular Posts