Chapter 1 - Introduction
Definitions:
Data: Basically, the information you're working with to analyze in a statistical fashion. This would be information that describes some characteristic(s) of a sample and, therefore, inferred to be a characteristic of a population. Ex. salary, GPA, colors, preference ratings, location, etc.
Data Collection: The information that has been collected from a sample through some research method.
Observation: The individual datum in a data collection. Ex. If studying GPAs (the data) of Hunter Students (the population), your particular GPA is a single observation in that data collection.
Statistics (plural): - The science of collecting, organizing, analyzing, interpreting and presenting data. The subject we're all taking right now.
Statistic (singular) - Specifically refers to a single measure, reported as a number, used to summarize a sample data collection.
Parameters - The measure inferred to a population from a statistic.
Population - The larger group of people or items to be analyzed. This group is usually so large that studying all of it would be cost and time prohibitive. For example, say you wanted to know the number of children raised by Asian-American women who are currently middle aged. There are far too many Asian-American middle aged women to be a feasible study of all of them. You need to work with samples and infer the parameters of the population from the statistic gained from the sample.
Sample - A select group from a larger population. It can be selected deliberately or by convenience, using any one of a number of sampling methods. (Covered in Chapter 2)
Descriptive Statistics - Refers to the collection, organization, presentation and summary of data. The data as is found in your research and organized in a readable manner.
Inferential Statistics - Refers to generalizing from a sample to a population, estimating unknown parameters, drawing conclusions and making decisions. Since populations are usually too large to study, it's the soundest way to project the likely case for populations. (Frequently done by several samples and repeated surveys.)
Post Hoc Fallacy - A faulty attribution of cause just because the event occurred earlier than the event it supposedly "caused". (If A precedes B, then A must have caused B. )
Empirical Data - data that is produced by either observation or experiment
What is Statistics?
"I like to think of statistics as the science of learning from data…" - John Kettenring, President ASA, 1997
Why Study Statistics?
Communication: The language of statistics is, by now, a standard expectation in the workplace. Its necessary to effectively communicate in any area of business at any substantive level.
Computer Skills: Whatever your computer skill level, it can be improved. Every time you create a spreadsheet for data analysis, write a report or make an oral presentation, you will use computer skills and learn new ones. You'll need to have confidence in software, preparation of charts and analysis of data.
Information Management: In large companies, it’s more common to be drowning in data than be starved for it. Statistical knowledge will help you know how to weed through the volumes and make it valuable.
Technical Literacy: Many of the best career opportunities are in growth industries propelled by advanced technology.
Career Advancement: In any case of selling anything (and even service industries sell information), knowledge that helps expand those sales will be integral to your career advancement. Statistical skill is that knowledge.
Quality Improvement: Every successful business undergoes constant quality monitoring, no matter what it sells (from airplanes to insurance to financial services). Operation of a business requires analytical skills in fields like statistics.
Uses of Statistics [in Business]:
Auditing
Ex. When an accounting error occurs, yet the entries and invoices are far too numerous to recalculate. A well selected sample of the invoices could provide a good indication of the frequency and breadth of the error.
Marketing
Ex. Amazon.com hired a consultant to identify repeat customers and target their likely purchases. Out of millions of previous sales, a well selected sample can help provide this information.
Health Care
Ex. Standardization of assessments of patients and their length of care.
Quality Control
Ex. A manufacturer wishes to standardize the quality of its product. It can use samples of the product to determine things like quality and worker effectiveness.
Purchasing
Ex. A supplier of DVDs shows a defect rate of 4 out of 200. The historical rate of defect was .005%. Has the defect rate risen sharply? Statistical analysis of samples will indicate the answer.
Medicine
Ex. An experimental drug tests twice the effect of a placebo. Is this within the realm of chance? A statistical analysis will tell you.
Forecasting
Home Depot carries 50,000 different products. How can consumer demand be predicted and ordering streamlined? Sampling will provide indications.
Product Warranty
A major automaker examines 4,300 warranty claims on a new hybrid in the first six months of usage. What is the margin of error that this will predict future costs? Sampling will indicate that.
Knowledge of statistics is far more about critical thinking than it is about being good at mathematics. Problem solving skills are absolutely critical in today's workforce.
Challenges in Statistics
Working with Imperfect Data: In some cases, new data may be too expensive to obtain, such as that of auto crash safety standards. The experienced analyst will know accepted statistical standards and clearly state any assumptions that he or she is forced to make and honestly point out the limitations of the analysis. Sometimes, that may mean saying that no useful answer can come from the data at all.
Dealing with Practical Constraints
You will face constraints on the type and quantity of data you can collect. Sometimes tests cannot be performed, such as auto crash tests with human subjects. Some survey questions can't be asked out of sensitivity. Some tests are too invasive to engage on a wide scale. Not every survey respondent will tell the truth. There are always constraints on research.
Upholding Ethical Standards
Safeguards are in place to protect professional integrity (and therefore, market value of the analyst), but also to decrease the chances of ethical breaches. All analysts must:
i) know and follow accepted procedures;
ii) maintain data integrity [ex. do not alter the data to suit your purposes];
iii) carry out accurate calculations;
iv) report procedures faithfully [to avoid corner cutting, which can be costly to the public];
v) protect confidential information (for, among other reasons, to insure future participation from the public);
vi) cite sources and;
vii) acknowledge sources of financial support (to allow the reader to assess if there is bias in the report or, what possible bias could exist).
Using Consultants
When should a consultant be hired? Knowing this may be one of the most important decisions a manager can make. When your staff lacks a certain skill pool; when an unbiased view cannot be found within your organization; when clashing personalities are involved. There are many challenges that call for an outside expert.
Writing and Presenting Reports
Writing a report is nearly as important as collecting and analyzing the data that supports it. It is essential to answer the assigned question succinctly. Describe what you did and what conclusions you reached, listing the most important results first.
Avoid Jargon: Its common sense to avoid slang in a business report, but this guideline applies to jargon inside your workplace as well. For example, few will know what an SSE is, unless you define it the first time it appears in your report.
Make it Attractive: Reports should contain a title page, a descriptive title, date and author names. Footers should be used to distinguish revised drafts. Use wide margins so that readers can make notes. Use appropriate fonts, such as Times Roman, Garamond and Arial. Call attention to your points through the use of subheadings, bullets, boldface, italics, color, etc - but use effects sparingly.
Watch Spelling and Grammar: Use software and, if possible, a proofreader.
Organizing a Technical Report: The following is a typical form.
- Executive Summary (1 page max)
- Introduction (1-3 paragraphs)
- Statement of the problem
- Data sources and definitions
- Methods utilized
- Body of the Report (as long as necessary)
- Discussion, explanations, interpretations
- Tables and graphs, as needed
- Conclusions (1-3 paragraphs)
- Statement of findings (in order of importance)
- Limitations (if necessary)
- Future Research suggestions
- Bibliography and Sources
- Appendices (if needed)
Tables and Graphs: Tables should be embedded in the narrative (and not on a separate page) near the paragraph in which they are cited. Each should be clearly titled and labeled. Graphs should also be embedded in the narrative near the paragraph in which they are discussed with clear labeling.
[I skipped the parts about preparing for an oral report. In 20 years in the workforce, I've had to give fewer than 5. In contrast, I've written countless reports. For purposes of the test, though, you may want to review the text here. ]
Conclusions from Small Samples: "My aunt smoked her whole life and lived to 90. Smoking doesn't hurt you." You can see the problem here.
Conclusions from Non-Random Samples: "Rock stars die young. Look at Jimi Hendrix, Janis Joplin, Kurt Cobain and Amy Winehouse." Those happen to be musicians/singers who did die young. What about the thousands who haven't? We’re linking these people up only by the conclusion instead of the sampling characteristic.
Conclusions from Rare Events: Unlikely events will happen if you take a large enough sample. "Mary from my office won the lottery. Her system must have worked." Millions of people play the lottery. Someone will eventually win it.
Poor Survey Methods: Ex. A professor asks his students if they've studied a certain method, but he does so in public. Some will be hesitant to respond because they fear being asked to explain it or that their classmates will think they're showing off. An anonymous survey would produce a more accurate result.
Assuming a Causal Link: The post hoc fallacy. (If A precedes B, then A must have caused B. ) The divorce rate fell in Mississippi in 2005 after Hurricane Katrina. Did the disaster cause couples to stay together? A closer look shows that he divorce rate was falling for the two years prior. So, Katrina had little to do with it. Correlation does not prove causation.
Generalizations about Individuals: "Men are taller than women". Yes, but only in a statistical sense. It cannot be extended to state that a specific man will absolutely be taller than a specific woman.
Unconscious Bias: Researchers can unconsciously allow bias to color their handling of data.
Significance vs. Importance: Statistically significant effects may lack practical importance. A recent study published showed that out of 500,000 Austrian military recruits, those born in the Spring were .6 cm taller than those born in the fall. Would parents really plan around that difference? Not likely. Likewise, a slight product improvement will not likely drive more consumer demand if they're already happy with the product. (Unless you're Apple...oops, JOKING!!!)