HelpForecasting with Trend Lines using Microsoft Excel™

Objectives
By the end of this lesson, you should be able to:
1. Identify time-series data for quantitative forecasting.

2. Enter time-series data into Excel and display it as a scatter plot.

3. Use Excel to add a linear regression line, and use a regression equation to predict future values.

4. Use Excel to add non-linear trendlines using exponential and second-order polynomial functions.

5. Note that data is sometimes transformed or modified prior to forecasting.

6. Discuss an ethical approach to the decisions involved in choosing forecasting methodologies.


Introduction

The term trends implies a change over time. One type of forecasting is quantitative, and involves analyzing time-series data, and then predicting what the future might be.

For example, sales at an ice cream stand at the town park in June of each of the last five years has been good, but in July it was about 20% more than in June. If this year, the stand took in $10,000 in June (a new record), how much would you  predict it will take in in July?  Well, if we were correct in our assumption based on the historical data, we'd estimate the July figure would be 20% higher, or $12,000.

Microsoft Excel offers some built-in tools for forecasting. One of these allows you to add a trendline to existing data points on a chart. This allows the user to interpolate (i.e., to find a data point between existing points) or to extrapolate (i.e., to find a data point past either end of the current data, either by forecasting foreword, or "backcasting" to an earlier period.)

However, as with templates, the developers of these tools have made some decisions for the user, and not all users would agree with those decisions. If the limitations imposed by Microsoft Excel's features for forecasting are inappropriate for a particular forecasting task, the reader is instead encouraged to use direct numerical manipulation using proven analytical techniques as described in any of several texts on forecasting (such as Makridakis, Wheelwright & Hyndman, 1998).

Before You Begin

This page assumes the user has Microsoft Excel™ 2010 or 2007 with the Analysis ToolPak Add-In from Microsoft installed.

Getting the Data

Let's look at some data concerning compact fluorescent lamps (CFLs) using the following as a source document:

US Department of Energy. (2009). CFL Market Profile - March 2009. Washington, DC: Author. Retrieved April 7, 2009 from http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf

The analysis in this report was performed by D & R International, LTD (http://www.drintl.com/.)

On Page 2, there is a bar chart (or bar graph) that lists the number of shipments of CFLs by year up to 2007, and then predicts the number of shipments in 2008, 2009, and 2010, based on that data.


Figure 1. Bar chart from US DOE, 2009, p.2.

Let's use the data in this chart, and the power of Microsoft Excel, to make a similar prediction. Ideally, you would have the actual data values, but in this case, an estimate was made based on the above graphic and the following was entered into an Excel spreadsheet.

  Year Incand. CFLs
  2000 1686 23
  2001 1659 71
  2002 1681 54
  2003 1668 69
  2004 1649 91
  2005 1641 104
  2006 1560 188
  2007 1325 400
Est: 2008 1319 328
Est: 2009 1238 360
Est: 2010 1148 395
Table 1. Raw data.

Let's only look at the historical data for 2000 to 2007, not at the estimates or predictions for 2008 to 2010. We can recreate the bar chart shown in the source document by selecting the historical data in Excel and creating a bar chart:


Figure 2. Raw data in a bar chart to match original

But instead, let's create a scatter plot of the values (since Excel's trendline equation feature can produce errors with bar charts or line graphs.)


Figure 3. Raw data in a scatter plot.

Adding a Linear Trendline and Regression Equation

Now, remember, we are just concerned with the CFL data, and we want to be able to predict future years. In order to add a trendline, click on one of the icons representing a data point for CFLs, and then right-click and select "Add Trendline." You'll see the following dialog box.


Figure 4. Trendline options dialog box.

In this example, we will assume that the number of CFLs shipped per year increases at a steady or linear rate. For now, in the Trendline Options area, select the following

  • Trend / Regression type: Linear

  • Forecast - Forward 3 periods

  • Display Equation on chart

After moving the equation we have:


Figure 5. Raw data with a linear trendline and regression equation.

The equation is a linear regression equation. That means that is it the equation of a straight line that best fits the points on the chart. The method Excel uses to determine this equations involves finding the line that produces the least value for the sum of the squares of the vertical differences between data points and the line. Like all lines, it has an equation in the form:

y = mx + b

where:

  • y is the number to be calculated, the dependent variable, or in this case, the number of millions of CFLs shipped per year;

  • m is the slope of the line, which equals the change in the y value divided by the change in the x value;

  • x is the given data point or the dependent variable, in this case, it is the year; and

  • b is the y-axis intercept of the line.

The equation, in this case, is:

y = 40.429 x - 80874

That means that for the year 2010, the predicted value is

y = (40.429 * 2010) + 80874

y388 million CFLs shipped

We can substituted other values for x, such as the year 2020, and since we now have an equation, we can predict that there will be 793 million CFLs shipped in the year 2020. Of course, this is making a lot of assumptions that we shouldn't make. In particular, we are assuming that the trend is linear, and that it will continue far into the future.

Alternate Method. You can find out the equation directly from the tabled data, if you like. Select two cells like G5 and G6 and then start typing in the formula: =LINEST(range) for the range, select all of the known y values, then type the closing parenthesis, but don't hit the Enter key. Instead, hit Control-Shift-Enter. You'll see the slope and the intercept appear in these two cells.

Non-Linear Trendlines

Many trends are not linear. For example, human population on the planet was fairly linear, but then it shot up, as illustrated by the red line in the following figure:

Figure 7.  Non-linear trend of "Long-term World Population Growth." This graphic is from United Nations, 1999, p. 7.

There are several non-linear predictive equations. We'll look at two, exponential equations and polynomial equations, but you are advised to explore others.

Exponential

Let's take that same historical CFL shipment data we used above and apply some non-linear trendlines. Here is an exponential trendline. It uses an equation that has the x value (the year) as an exponent. I clicked on the new equation an selected "format trendline label" to display the equation in scientific notation with six decimal points, since the default does not give me enough precision for predicting.


Figure 8. Raw data with exponential trendline.

As we can see, there trendline is curved, not quite as much as is indicated by the relatively high 2007 datum point, but it is still curved up.

The predictive equation is:

y = 1.598767 E -279 e ^ 3.226616 E -01 x

Recall that the capital E means "Times ten to the power of" and that the lower case e is a constant approximately equal to 2.71828. In Excel, I can then type in the following formula in any cell:

=1.598767E-279*EXP(0.3226616*2010)

and by replacing "2010" with the year, get a prediction for that year. The value for 2010 is 733 million CFLs, and the value for 2012 is 1.398 billion CFLs.

Polynomial

The predictive equation can be a polynomial. We saw that the linear regression equation was

y = mx + b

A second-order, or quadratic, polynomial equation adds an x2 term, resulting in:

y = ax2 + bx + c

The graph of a quadratic equation of this form is typically a parabola. Here is the same data with a second order polynomial trendline:


Figure 9. Second order polynomial trend line with equation.

It is possible to increase the order, adding an x3, x4, or x5 term, if there is reason to believe such a curve will be more accurate.

Modifying the Data

Sometimes, we suspect the data should be modified. In our example, notice how high the value of 400 was for 2007. An analyst might have reason to believe that this point was an outlier, and due to some special circumstances, like a one-time marketing blitz, the high value of this datum is throwing off the future prediction. Let's alter the data, reducing that point to 300.

  Year Incand. CFLs
  2000 1686 23
  2001 1659 71
  2002 1681 54
  2003 1668 69
  2004 1649 91
  2005 1641 104
  2006 1560 188
  2007 1325 300
Est: 2008 1319 328
Est: 2009 1238 360
Est: 2010 1148 395
Table 2. Revised data.

Using the revised data and second-order polynomial forecasting, we get:


Figure 10. The value of 2007 was changed from 400 to 300 in the belief that this was an abnormal value.

Notice how Figure 10 is relatively close to the initial prediction shown in the original US DOE source document.

There are many ways to transform and adjust data, and in each instance the analyst should have a defensible line of reasoning that justifies the transformation.

Exercising Care

As with many forms of statistical analysis, trendline extrapolation can be subject to deliberate attempts to make the data suggest the analyst's bias. This is inappropriate. Where there are alternative projections, it is best to present them with explanations of each. For example, the following illustration shows several different paths world population might take given different conditions explained by the authors.


Figure 11. "World population according to five projection scenarios, 1950 - 2150" from United Nations, 1998, p. 4.

Other Quantitative Methods

As seen on Excel's Trendline Options dialog box, there are other types of trendlines that can be added, including a logarithmic, power, and moving average trendline. The Analysis ToolPak Add-in for Excel also has several forecasting tools. To access them, click Data Analysis in the Data tab. You will see moving average, regression, and exponential smoothing there, all of which can be used to forecast.

But don't stop there, Excel, like some other programs for numerical manipulation, allows the user to directly control the formulas used to derive values. We do not have to settle for the default settings used in the Add Trendline feature of charts, but we can instead perform the necessary calculations on the data directly.

For information on the methods in this lesson, and others, such as the Box-Jenkins method, dynamic regression, multiple regression, please consult a text on forecasting, such as the one by Makridakis, Wheelwright, & Hyndman (1998).


References

Makridakis, S., Wheelwright, S., & Hyndman, R. (1998). Forecasting: Methods and Applications. 3rd ed. New York: Wiley & Sons.

United Nations. (1998). Long-range World Population Projections: Based on the 1998 Revision. Executive Summary: Author. Retrieved April 7, 2009 from http://www.un.org/esa/population/publications/longrange/longrangeExecSum.pdf

United Nations. (1999). The World at Six Billion. New York: Author. Retrieved April 7, 2009 from http://www.un.org/esa/population/publications/sixbillion/sixbillion.htm

United States Department of Energy. (2009). CFL Market Profile - March 2009. Washington, DC: Author. Retrieved April 7, 2009 from http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf



"Forecasting with Trend Lines using Microsoft Excel™"
All information is subject to change without notification.
© Jim Flowers
Department of Technology, Ball State University