Chapter 3 – Part 2
3.4 Excel Charts
[Will cover this as soon as I can get my Add-In to work. Analysis ToolPak Add-in is suggested here. But I know it's not necessary in order to generate charts in Excel. I just have to remember how.]
3.5 Line Charts - Used to display a time series, to spot trends or to compare time periods. Can be used to display several variables at once. If two variables are displayed, the right and left scales will differ. Excel's two-scale line chart will let you compare variables that differ in magnitude or are measured in different units. (See Figure 3.16, p. 77). Too many variables, however, will render the chart unreadable.
No vertical grid lines, but can use horizontal grid lines to establish the Y variables at a certain data marker. If using horizontal grid lines, it's suggested that they be faint, so as to not clutter the background of the chart.
Making and Excel Line Chart
Step 1: Highlight the data you want to display in a line chart. Select the Insert ribbon. Click on the Line icon and choose a line chart style. Do not highlight the X-axis (horizontal) labels. You can add these later.
Step 2: If you wish to add a fitted trend, right-click on the data series on the line chart and choose Add Trendline. By default, it will show up as a linear trend.
Step 3: To customize your graph, click on it. The Chart Tools ribbon will become active. To edit the Design, choose that ribbon. To edit the Layout, either choose that ribbon or right-click on the feature (Title, Axis Titles, Gridlines) you wish to edit.
When to Use Log Scales - For time series data that might be expected to grow at a compounded rate. (GDP, national debt, future income)
Tips for Effective Line Charts -
- Line charts are for time series data. Never for cross-sectional data.
- The numerical values are on the Y-axis (vertical) while the time units are shown on the X-axis (horizontal), reading left-to-right as time progresses. Business audiences expect this rule to be followed.
- Except for log scales, use a zero origin on the Y-axis (vertical) unless more detail is needed. The zero-origin rule is mandatory in corporate annual reports or an investor stock prospectus.
- To avoid graph clutter, numerical labels are generally omitted. Use the gridlines to help a reader read data values.
- Data markers (triangles, squares, circles) are helpful and should be used when there are few data values. However, when there are too many data values, markers may make the chart look cluttered and obscure the legibility.
- If the lines on the graph are too thick, the reader can't ascertain graph values.
3.6 Bar Charts - Probably the most common type of data display used in business. Each bar represents an attribute or category. The length of each bar represents a frequency or value for that category. Unlike the histogram, each bar is separated by a slight gap to improve legibility. (Excel allows control of gap width.)
Vertical bar charts are often called "column charts" and are most commonly used. Horizontal bar charts are used when the axis labels are long or there are many categories.
3-D column charts can be visually appealing, but they confuse in terms of bar height. (Read from the back of the bar or the front?) Novelty charts, such as the Pyramid Chart are often used for general readership like in USA Today, but they're avoided in business because they distort the bar height.
Pareto Charts - A special type of column chart that displays categorical data in descending order of frequency, so that the most common categories will appear first (left hand side of the chart). Typically, the first few categories will account for the majority of observations. (The "80/20 Rule").
Commonly used in quality management to display the frequency of defects or errors of different types. Figure 3.21 (p. 83) shows that out of complaints collected from concession stand customers, the tope three categories (Cold food, Too much time in line, No vegetarian choice) make up 76 percent of the total complaints. Sorting the categories in descending order helps managers focus on the vital few causes of problems instead of the trivial many. (Because the bulk of observations will occur in a few categories.)
Stacked Bar Chart
The bar height is the sum of several subtotals. Areas can be of different colors to allow for comparison and to show patterns in the subgroups. Can be most effective for a few groups of data. Use numerical labels if exact data is important.
- The numerical value of interest should go on the Y-axis (vertical) while the categories should go on the X-axis (horizontal);
- If the quantity displayed is a time series, the category labels (e.g. years) are displayed on the x-axis with time increasing from left to right;
- The height or length of each bar should be proportional to the quantity displayed. The zero origin is a must for annual reports or a stock prospectus. However, nonzero origins may be justified to reveal sufficient detail.;
- Put numerical values at the top if each bar except when labels would impair legibility (e.g. lots of bars) or when visual simplicity is needed (e.g for a general audience).
They show n pairs of observations as dots or some other symbol on an X-Y graph. Such is why scatterplots are best for displaying bivariate data in (x,y) pairings. They are used to investigate a relationship between two variables. (We would want to know if there is some association between the two and, if so, what kind?)
By examining scatter plots, we can see if there is some association, but still be cautious of engaging a post hoc fallacy (fr. Chapter 1). That is, we have to be cautious to not read in a causal relationship between the two. For example, in Figure 3.24, there is a relationship between high birth rates and high infant mortality shown per nation. (The lower the birth rate, the lower the infant mortality and vice versa.) What isn't shown is a third variable that could be impacting both (e.g. GDP per capita.)
Figure 3.25 on p. 87 shows a few Prototype Scatter Plot Patterns. They can show a Strong Positive association (dots seem to form a line drawing upward from left to right). They can show a Strong Negative association (dots seem to form a line drawing downward from left to right.) They can also show weak positives or negatives or non-linear patterns or no pattern at all.