More Statistics

If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com

Symbols:

∑ = sum

Population - All of the items we are interested in. (Can be finite, such as passengers on a plane or infinite, such as all Cokes bottled in an ongoing process.) Population symbols are commonly Greek letters. (But not always)

N = size

µ = mean

σ2 = variance

σ = standard deviation

Sample - A subset of the population that we will actually be analyzing. Sample symbols are commonly Roman letters. (But not always)

n = size

x̄ = mean

s2 = variance

s = standard deviation

The Three Main Measurements

Central Tendency:

Knowing how and when to use a measure of central tendency is a matter of understanding what it is you're trying to understand about the data. Is it of a set of unusual circumstances? Is it about compounded growth? The answers to these questions will make a difference as to what measure of central tendency you'll use.

The Six Measures of Central Tendency

Statistic

Formula

Excel Command (just in case)

Mean
x̄ = ∑xi

       ------

     n

(sample mean = the sum of the values divided by the number of values) = AVERAGE(Data)

Median The middle value in a sorted array of values =MEDIAN(Data)

Mode The most frequently occurring value =MODE(Data)

Midrange
xmin + xmax/2

(The sum of the largest and smallest values divided by 2) =*.5(MIN(Data)+MAX(Data))

Geometric Mean n√(x1)(x2)(x3)…

The product of all of the values rooted by the number of the values. ("Square" roots are by 2. "Cubed" roots are by 3. In this case, it's the nth root where n is the number of values in the set) =GEOMEAN(Data)

Trimmed Mean Same as the mean, except omitting the highest and lowest k% of the data. (e.g. 5%) TRIMMEAN(Data, Percent)

Pros and Cons of each method

Statistic Pro Con

Mean Familiar and it uses all the information in the sample Influenced by extreme values

Median Robust when extreme values exist Ignores extremes and can be affected by gaps in the data

Mode Useful for attribute data or discrete data with a small range Could be more than one mode and it's not helpful for continuous data

Midrange Easy to understand and calculate Influenced by extreme values and ignores most data values

Geometric Mean Useful for growth rates and mitigates high extremes Less familiar. Requires that the data are all positive values.

Trimmed Mean Mitigates the effect of extreme values Excludes some data values that could be relevant.

Skew's Effect on the Mean and Median

In a symmetric distribution, these two central tendency measures are the same.   In a skewed distribution, they'll differ.

In a left-skewed distribution, in which the outliers are extremely low values, the mean will be less than the median. (The extremely low outliers are pulling the mean downward.)

In a right-skewed distribution, in which the outliers are extremely high values, the mean will be greater than the median. (The extremely high outliers are pulling the mean upward.)

In all distributions, the mode will be in the modal class (fr. Chapter 3). That is, the mode(s) will be located in the peaks in a histogram (or bar chart), because those are the most frequently occurring values.

Measures of Dispersion
Dispersion, or how values are varied and spread out in a distribution, will tell you a lot about the data set. Measures of dispersion are also helpful in telling us a lot about a particular value in a set.

Five Measures of Dispersion of a Sample

Statistic

Formula

Excel Command

Range (R)
(xmax - xmin)
(the maximum value in the set minus the minimum value) =MAX(Data)-MIN(Data)

Sample Variance (s2)

∑n (xi - x̄)2
                                            i=1

--------------

n-1

First, subtract the known sample mean (x̄) from each individual value (x) and square the result. Then add up all of the [squared] differences.

Divide that number by n-1 =VAR(Data)

Sample Standard Deviation (s)
√ ∑n (xi - x̄)2
                                               i=1

--------------

n-1

Just like in Sample Variance, first, subtract the known sample mean (x̄) from each individual value (x) and square the result. Then add up all of the [squared] differences.

Divide that number by n-1

But now, also take the square root of the result. =STDEV(Data)

Coefficient of Variation (CV)
100 * s/x̄

100 times the sample standard deviation divided by the sample mean. None

Mean Absolute Deviation (MAD)
∑n |xi - x̄|
                                          i=1

--------------

n

Just like sample variance, but without needing to square the result. The absolute value function will make all differences positive, so there won't be any cancellation of values when the sum is taken. =AVEDEV(Data)

Pros and Cons of Various Measures of Dispersion

Statistic Pro Con

Range Easy to calculate. Easy to interpret Sensitive to extreme data values

Variance Plays a key role in mathematical statistics Less intuitive meaning

Standard Deviation Most commonly used measure.
Expressed in same units as the data. ($, grams, etc) Less intuitive meaning

Coefficient of Variation Expresses relative variation in percent so you can compare data sets with different units of measurement Requires nonnegative data

Mean Absolute Deviation Easy to understand. Lacks "nice" theoretical properties

There's more in this chapter that he listed. I'm just out of time to input it and get it to you all in a period that would be useful to you.

You can cover the following in the Power Point Slides (online learning center):

Chebyshev's Theorem and the Empirical Rule (Said Empirical Rule was more important. They're both theories on how to ID outliers.)

Defining a Standardized Variable - Z scores. How to calculate

Percentiles, Quartiles and Box-and-Whisker Plots (including how to make and interpret Box-and-Whisker plots). Fences, unusual data values and midranges

If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com

Setting The Stage For Learning About The Earth

If you have any test reviews, homeworks, guides, anything school related that you think can be posted on this website, reach out to me at makingschooleasier@gmail.com (These Answers Should Be Used as a Basis For Yours) Exercise 1.1 Submergence Rate Along the Maine Coast The rate of submergence is the total change in elevation of the pier 2 meters divided by the total amount of time involved 300 years and is therefore .67 cm/yr Exercise 1.4 Sources of Heat for Earth Processes A. The sand should be hot since the sun has been heating up the sand throughout the day. i. When you dig your feet into the sand you should feel cooler sand since the sun's penetration into the earth is limited. ii. This suggests that the Sun can only penetrate into the Earth up until a certain depth. iii.Based on this conclusion, one can assume that the Sun is not responsible for the Earth's internal heat since, we have heat hundreds of kilometers within the Earth and thi...

Homework Will Never End

Search This Blog

More Statistics

Labels

Popular posts from this blog

Setting The Stage For Learning About The Earth

The Romantics: John Keats and Samuel T. Coleridge

history outline

Statistic	Formula	Excel Command (just in case)
Mean	x̄ = ∑xi ------ n (sample mean = the sum of the values divided by the number of values)	= AVERAGE(Data)
Median	The middle value in a sorted array of values	=MEDIAN(Data)
Mode	The most frequently occurring value	=MODE(Data)
Midrange	xmin + xmax/2 (The sum of the largest and smallest values divided by 2)	=*.5(MIN(Data)+MAX(Data))
Geometric Mean	n√(x1)(x2)(x3)… The product of all of the values rooted by the number of the values. ("Square" roots are by 2. "Cubed" roots are by 3. In this case, it's the nth root where n is the number of values in the set)	=GEOMEAN(Data)
Trimmed Mean	Same as the mean, except omitting the highest and lowest k% of the data. (e.g. 5%)	TRIMMEAN(Data, Percent)

Statistic	Pro	Con
Mean	Familiar and it uses all the information in the sample	Influenced by extreme values
Median	Robust when extreme values exist	Ignores extremes and can be affected by gaps in the data
Mode	Useful for attribute data or discrete data with a small range	Could be more than one mode and it's not helpful for continuous data
Midrange	Easy to understand and calculate	Influenced by extreme values and ignores most data values
Geometric Mean	Useful for growth rates and mitigates high extremes	Less familiar. Requires that the data are all positive values.
Trimmed Mean	Mitigates the effect of extreme values	Excludes some data values that could be relevant.

Statistic	Formula	Excel Command
Range (R)	(xmax - xmin) (the maximum value in the set minus the minimum value)	=MAX(Data)-MIN(Data)
Sample Variance (s2)	∑n (xi - x̄)2 i=1 -------------- n-1 First, subtract the known sample mean (x̄) from each individual value (x) and square the result. Then add up all of the [squared] differences. Divide that number by n-1	=VAR(Data)
Sample Standard Deviation (s)	√ ∑n (xi - x̄)2 i=1 -------------- n-1 Just like in Sample Variance, first, subtract the known sample mean (x̄) from each individual value (x) and square the result. Then add up all of the [squared] differences. Divide that number by n-1 But now, also take the square root of the result.	=STDEV(Data)
Coefficient of Variation (CV)	100 * s/x̄ 100 times the sample standard deviation divided by the sample mean.	None
Mean Absolute Deviation (MAD)	∑n \|xi - x̄\| i=1 -------------- n Just like sample variance, but without needing to square the result. The absolute value function will make all differences positive, so there won't be any cancellation of values when the sum is taken.	=AVEDEV(Data)

Statistic	Pro	Con
Range	Easy to calculate. Easy to interpret	Sensitive to extreme data values
Variance	Plays a key role in mathematical statistics	Less intuitive meaning
Standard Deviation	Most commonly used measure. Expressed in same units as the data. ($, grams, etc)	Less intuitive meaning
Coefficient of Variation	Expresses relative variation in percent so you can compare data sets with different units of measurement	Requires nonnegative data
Mean Absolute Deviation	Easy to understand.	Lacks "nice" theoretical properties