HelpForecasting Trends

Objectives
By the end of this lesson, you should be able to:
1. Define forecasting, backcasting, interpolation, time-series data, and trend. Distinguish between qualitative and quantitative forecasting.

2. Indentify the graph of the following types of trends and whether they are non-seasonal, with additive seasonality, or with multiplicative seasonality: constant; linear; exponential; damped; and polynomial.

3. Use formulas in Microsoft Excel™ to forecast using the following strategies:

  • Na´ve Forecasts
  • Na´ve Trends
  • Moving Averages
  • Weighted Moving Averages
  • Single Exponential Smoothing
  • Double Exponential Smoothing

4. Use Excel's™ chart feature to forecast.

5. Explain how forecasting can be used where a particular type of future data is needed, as seen in an example related to the release of mercury from unrecycled compact fluorescent lamps. Forecast and provide mathematical forecasting models (formulas).

6. Discuss the evaluation of forecasts and forecasting assumptions, note the care that must be taken in forecasting, and define the coefficient of determination

7. Locate additional resources related to forecasting.


Forecasting

Yogi Bera is credited with saying, "It's tough to make predictions, especially about the future."

The humor here is that forecasting implies the application of a variety of tools to predict future situations, or results. "Fore" implies a forward predication, though it is possible to use data from the 1920s to forecast for the 1930s, comparing our forecasts to the actual data from the 1930s. If we move backward, attempting to predict a previous condition, the term backcasting can be used, though the word, "predict," is misleading here, and should instead be "suggest." If we know existing time series data points, but there are some holes, we can estimate the values for those missing points between known points; that process is called interpolation.

One way to classify types of forecasts are based on the type of data. quantitative forecasting is based on numeric data from several different time periods of the past that can be assumed to provide a pattern that allows us to predict the future; whereas qualitative forecasting relies more heavily on non-numeric data, the judgments of specialists, and their variety of knowledge, according to Makridakis, Wheelwright, & Hyndman (1998). They suggest the following general steps in quantitative forecasting:

  1. Defining the problem
  2. Acquiring relevant data and judgments of key individuals
  3. Initial graphing and analysis
  4. Selecting and adapting a model to fit the data
  5. Applying that model to forecast future data, and evaluating that model

Time-Series Data and Trends

As noted, some data is particular helpful in quantitative forecasting. This is time-series data, where we know the numerical value for several different points in time. Typically, time is plotted along the horizontal x-axis of a graph, and the variable measured or to be predicted is plotted along the vertical y-axis. The term trend implies a pattern of change over time, and if we suspect a trend exists, we may be able to extrapolate that trend to future time periods, forecasting data for those periods.

For example, sales at an ice cream stand at the town park in June of each of the last five years has been good, but in July it was about 20% more than in June. If this year, the stand took in $10,000 in June (a new record), how much would you  predict it will take in in July?  Well, if we were correct in our assumption based on the historical data, we'd estimate the July figure would be 20% higher, or $12,000.

Sometimes, the data does not suggest a trend, or the trend is so weak that it might as well not exist, as in the following:


Figure 1. Random time-series data not suggesting a strong trend.

Other times, however, there does seem to be a pattern to time-series data, and in these cases, we may be able to use trend extrapolation to forecast future data. First, let's look at some different classifications of trends.

Some Common Types of Trends

Trends are often shown graphically (as line graphs) with the level of a dependent variable on the y-axis and the time period on the x-axis. There are different types of trends, including the following:

  • constant
  • linear
  • exponential
  • damped
  • polynomial
The following graphs each contain 100 points of fictitious data connected with a blue line, and the trend superimposed with a black line. They are based on Figure 3-1 on Page 112 from Levin, Rubin, & Stinson (1986).

Constant trends are those where there is no net increase or decrease.


Figure 2. A constant trend that does not rise or fall.

However, there may be seasonality, or a periodic fluctuation (as there would be in a graph of the temperature over a 1-year period, with daytime temperatures higher than nighttime):


Figure 3. A constant trend that fluctuates regularly.

The above graph shows the same data with periodic additions and subtractions. The blow graph shows this data, but the periods are based on a multiple of the data in the x-axis. Often, it is instead a multiplicative factor of the y-axis that is used.


Figure 4. A constant trend that fluctuates regularly, and the degree of that fluctuate is a factor of the y-value, or in this case, the x-value.


Linear trends show a steady, straight-line increase or decrease. So the trendline may go up or down, and the angle may be steep or shallow.


Figure 5. A linear trend showing a steady increase.


Figure 6. A linear trend with additive seasonality.


Figure 7. A linear trend where the periodic fluctuation is a factor of the y-value.


Exponential trends are those where the data rises or falls not at a steady rate, but at an increasing rate. The x-value (plotted horizontally) is an exponent of the trendline formula to derive the y-value.


Figure 8. An upward exponential trend, where the x value is an exponent in the equation to derive y.


Figure 9. An upward exponential trend with additive seasonality.


Figure 10. An upward exponential trend with multiplicative seasonality.


Damped trends are those that approach a horizontal asymptote:


Figure 11. A damped trend.


Polynomial trends are those best modeled by a polynomial equation. They may be second-order (quadratic) equations of the form y = ax2 + bx + c, resulting in a parabolic shape:


Figure 12. A second-order polynomial trend.


Figure 13. Another second-order polynomial trend.

Polynomial trendlines may also be third order (y = ax3 + bx2 + c) or higher:


Figure 14. A third-order polynomial trend.


Figure 15. A third-order polynomial trend with additive seasonality.


Figure 16. A third-order polynomial trend where seasonality is a factor of the y value.


Figure 17. Another third-order polynomial trend.
 


Quantitative Forecasting using Trend Extrapolation

There are several tools available for using trend extrapolation to first plot a trendline to historical time-series data, and then extend this to future periods for the purpose of forecasting or predicting values for those periods. Some might be tempted to visually extend a trendline to future periods with a pencil on a printed graph, but algebraic techniques are more precise, more varied, and more powerful.

There are three general approaches to use algebraic techniques for trendline extrapolation. The first of these involves applying formulas to calculate a future period. The second technique is to make use of specialized functions within a spreadsheet program or another data analysis program. The third technique is to use a spreadsheet or data analysis program to construct a graph with a trendline, and to automatically extend that trendline to future periods.

Using Formulas

Based on  Chapter 3 of Levin, Rubin, & Stinson, (1986), I have created a tutorial (with practice) on using formulas in Microsoft Excel™ to perform forecasts using the following forecasting techniques:

  • Na´ve Forecasts
  • Na´ve Trends
  • Moving Averages
  • Weighted Moving Averages
  • Single Exponential Smoothing
  • Double Exponential Smoothing
Please download the file from one of the following and follow the instructions (required download), checking your answers.

Notice that it is not that difficult to type in the formulas that are needed to do the forecasting. In some instances, such as Single Exponential Smoothing, there is a built-in feature in the Microsoft Excel Analysis ToolPak™ Add-In.

Using Excel's Charts for Trend Extrapolation

Microsoft Excel™ offers some built-in tools for forecasting. One of these allows you to add a trendline to existing data points on a chart. However, as with templates, the developers of these tools have made some decisions for the user, and not all users would agree with those decisions. If the limitations imposed by Microsoft Excel's features for forecasting are inappropriate for a particular forecasting task, the reader is instead encouraged to use direct numerical manipulation using proven analytical techniques as described in any of several texts on forecasting (such as Makridakis, Wheelwright & Hyndman, 1998).

Before You Begin

This remainder of this page is based on the assumption that the reader has Microsoft Excel™ 2007 with the Analysis ToolPak Add-In from Microsoft installed.

Getting the Data

Let's look at some data concerning compact fluorescent lamps (CFLs) using the following as a source document:

US Department of Energy. (2009). CFL Market Profile - March 2009. Washington, DC: Author. Retrieved April 7, 2009 from http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf

The analysis in this report was performed by D & R International, LTD (http://www.drintl.com/.)

On Page 2, there is a bar chart (or bar graph) that lists the number of shipments of CFLs by year up to 2007, and then predicts the number of shipments in 2008, 2009, and 2010, based on that data.


Figure 18. Bar chart from US DOE, 2009, p.2.

Let's use the data in this chart, and the power of Microsoft Excel™, to make a similar prediction. Ideally, you would have the actual data values, but in this case, an estimate was made based on the above graphic and the following was entered into an Excel spreadsheet.

  Year Incand. CFLs
  2000 1686 23
  2001 1659 71
  2002 1681 54
  2003 1668 69
  2004 1649 91
  2005 1641 104
  2006 1560 188
  2007 1325 400
Est: 2008 1319 328
Est: 2009 1238 360
Est: 2010 1148 395
Table 1. Raw data.

Let's only look at the historical data for 2000 to 2007, not at the estimates or predictions for 2008 to 2010. We can recreate the bar chart shown in the source document by selecting the historical data in Excel and creating a bar chart:


Figure 19. Raw data in a bar chart to match original

But instead, let's create a scatter plot of the values (since Excel's trendline equation feature can produce errors with bar charts or line graphs.)


Figure 20. Raw data in a scatter plot.

Adding a Linear Trendline and Regression Equation

Now, remember, we are just concerned with the CFL data, and we want to be able to predict future years. In order to add a trendline, click on one of the icons representing a data point for CFLs, and then right-click and select "Add Trendline." You'll see the following dialog box.


Figure 21. Trendline options dialog box.

In this example, we will assume that the number of CFLs shipped per year increases at a steady or linear rate. For now, in the Trendline Options area, select the following

  • Trend / Regression type: Linear

  • Forecast - Forward 3 periods

  • Display Equation on chart

After moving the equation we have:


Figure 22. Raw data with a linear trendline and regression equation.

The equation is a linear regression equation. That means that is it the equation of a straight line that best fits the points on the chart. The method Excel uses to determine this equations involves finding the line that produces the least value for the sum of the squares of the vertical differences between data points and the line. Like all lines, it has an equation in the form:

y = mx + b

where:

  • y is the number to be calculated, the dependent variable, or in this case, the number of millions of CFLs shipped per year;

  • m is the slope of the line, which equals the change in the y value divided by the change in the x value;

  • x is the given data point or the dependent variable, in this case, it is the year; and

  • b is the y-axis intercept of the line.

The equation, in this case, is:

y = 40.429 x - 80874

That means that for the year 2010, the predicted value is

y = (40.429 * 2010) + 80874

y388 million CFLs shipped

We can substitute other values for x, such as the year 2020, and since we now have an equation, we can predict that there will be 793 million CFLs shipped in the year 2020. Of course, this is making a lot of assumptions that we shouldn't make. In particular, we are assuming that the trend is linear, and that it will continue far into the future.

Alternate Method. You can find out the equation directly from the tabled data, if you like. Select two cells like G5 and G6 and then start typing in the formula: =LINEST(range) for the range, select all of the known y values, then type the closing parenthesis, but don't hit the Enter key. Instead, hit Control-Shift-Enter. You'll see the slope and the intercept appear in these two cells.

Non-Linear Trendlines

Many trends are not linear. For example, human population on the planet was fairly linear, but then it shot up, as illustrated by the red line in the following figure:

Figure 23.  Non-linear trend of "Long-term World Population Growth." This graphic is from United Nations, 1999, p. 7.

There are several non-linear predictive equations. We'll look at two, exponential equations and polynomial equations, but you are advised to explore others.

Exponential Let's take that same historical CFL shipment data we used above and apply some non-linear trendlines. Here is an exponential trendline. It uses an equation that has the x value (the year) as an exponent. I clicked on the new equation an selected "format trendline label" to display the equation in scientific notation with six decimal points, since the default does not give me enough precision for predicting.


Figure 24. Raw data with exponential trendline.

As we can see, there trendline is curved, not quite as much as is indicated by the relatively high 2007 datum point, but it is still curved up.

The predictive equation is:

y = 1.598767 E -279 e ^ 3.226616 E -01 x

Recall that the capital E means "Times ten to the power of" and that the lower case e is a constant approximately equal to 2.71828. In Excel, I can then type in the following formula in any cell:

=1.598767E-279*EXP(0.3226616*2010)

and by replacing "2010" with the year, get a prediction for that year. The value for 2010 is 733 million CFLs, and the value for 2012 is 1.398 billion CFLs.

Polynomial The predictive equation can be a polynomial. We saw that the linear regression equation was

y = mx + b

A second-order, or quadratic, polynomial equation adds an x2 term, resulting in:

y = ax2 + bx + c

The graph of a quadratic equation of this form is typically a parabola. Here is the same data with a second order polynomial trendline:


Figure 25. Second order polynomial trend line with equation.

It is possible to increase the order, adding an x3, x4, or x5 term, if there is reason to believe such a curve will be more accurate.

Modifying the Data

Sometimes, we suspect the data should be modified. In our example, notice how high the value of 400 was for 2007. An analyst might have reason to believe that this point was an outlier, and due to some special circumstances, like a one-time marketing blitz, the high value of this datum is throwing off the future prediction. Let's alter the data, reducing that point to 300.

  Year Incand. CFLs
  2000 1686 23
  2001 1659 71
  2002 1681 54
  2003 1668 69
  2004 1649 91
  2005 1641 104
  2006 1560 188
  2007 1325 300
Est: 2008 1319 328
Est: 2009 1238 360
Est: 2010 1148 395
Table 2. Revised data.

Using the revised data and second-order polynomial forecasting, we get:


Figure 26. The value of 2007 was changed from 400 to 300 in the belief that this was an abnormal value.

Notice how Figure 10 is relatively close to the initial prediction shown in the original US DOE source document.

There are many ways to transform and adjust data, and in each instance the analyst should have a defensible line of reasoning that justifies the transformation.

Exercising Care

As with many forms of statistical analysis, trendline extrapolation can be subject to deliberate attempts to make the data suggest the analyst's bias. This is inappropriate. Where there are alternative projections, it is best to present them with explanations of each. For example, the following illustration shows several different paths world population might take given different conditions explained by the authors.


Figure 27. "World population according to five projection scenarios, 1950 - 2150" from United Nations, 1998, p. 4.


Other Quantitative Methods

As seen on Excel's Trendline Options dialog box, there are other types of trendlines that can be added, including a logarithmic, power, and moving average trendline. The Analysis ToolPak Add-in for Excel also has several forecasting tools. To access them, click Data Analysis in the Data tab. You will see moving average, regression, and exponential smoothing there, all of which can be used to forecast.

But don't stop there, Excel, like some other programs for numerical manipulation, allows the user to directly control the formulas used to derive values. We do not have to settle for the default settings used in the Add Trendline feature of charts, but we can instead perform the necessary calculations on the data directly.

For information on the methods in this lesson, and others, such as the Box-Jenkins method, dynamic regression, multiple regression, please consult a text on forecasting, such as the one by Makridakis, Wheelwright, & Hyndman (1998).


Forecasting Exercise / Example

There is another spreadsheet that contains an example of using Microsoft Excel's forecasting functions to predict future values. Using data on the number of compact fluorescent lamps (CFLs) shipped, and on the recycling rate of CFLs and their mercury content, this spreadsheet guides you through the process of predicting the cumulative environmental discharge of toxic mercury from CFLs that are not recycled (required download):.


Evaluating Trend Analysis & Assumptions

Coefficient of Determination

If you apply a trendline to a graph using Excel, you may notice that you can have both the trendline equation displayed, and the value for r2. r2 is called the coefficient of determination, and it is a measure of the proportion of the variability that is explained by the trendline equation. If r2 = .74, then 74% of the fluctuation of the data corresponds to the trendline, and 26% of the fluctuation deviates from what would have been predicted by the trendline. Thus, an intermediary step in the evaluation of models for trendline analysis would be to look at the value of the coefficient of determination, but do not let this be the only criterion.

For example, consider the following sample data

Time Data
1 20
2 80
3 50
4 70
5 100
6 110
7 190
8 300
9 ?
10 ?
Table 3. More raw data.

We could map a linear regress line to the historical data (Time = 1 to 8), but when we use Excel to map a 3rd order polynomial trendline, we find the value for r2 is much higher, at R2 = 0.9787:


Figure 28. Finding a trend that fits: 3rd order polynomial?

This means that the trendline curve accounts for 97.87% of the variation among the data points. That's pretty good. But we're not convinced it is optimal, so we keep clicking options in Excel, finding that when we specify a 6th order polynomial trendline, it accounts for 99.96% of the variation.


Figure 29. Finding a trend that fits: 6th order polynomial?

We are very impressed, and we decide to use this 6th order trendline to predict future periods. Imagine our surprise when we extend the trendline two periods to the future to result in the following:


Figure 30. That 6th order polynomial had a high coefficient of determination, but was a poor predictor.

That can't be right. So we resolve to go back to the 3rd-order polynomial and its predications:


Figure 31. A more reasonable prediction with the 3rd order polynomial trendline.

In all, one of the best ways to test predications is to, well, wait. Testing a predication against the data that starts to come in has no equal, but it does require waiting, and if we were willing to wait, there would have been no reason to make a prediction in the first place. Thus, it is wise to attempt to assess the strength of a predication prior to the emergence of new data.

Assumptions

Trend extrapolation is one aspect of the larger field of trend (or trendline) analysis. It attempts to extend known data points to regions beyond the timeframe of known datapoints, almost always in an attempt to predict future values with some degree of probability. However, the assumptions made are critical.

For example, let's look at "Biomass as feedstock for a Bioenergy and Bioproducts Industry: The Technical Feasibility of a Billion-Ton Annual Supply," which was published in April 2005 under the auspices of the US Department of Agriculture and the US Department of Energy (required visit):

www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf

Please look at the first page of the Executive Summary, where the purpose is stated:

"The purpose of this report is to determine whether the land resources of the United States are capable of producing a sustainable supply of biomass sufficient to displace 30 percent or more of the country's present petroleum consumption.... The short answer to the question of whether that much biomass feedstock can be produced is yes."

That is worth repeating, "The short answer ... is yes." However, now look at the assumptions under which this answer was made.

"Important assumptions that were made include the
following:
  • yields of corn, wheat, and other small grains were increased by 50 percent;
  • the residue-to-grain ratio for soybeans was increased to 2:1;
  • harvest technology was capable of recovering 75 percent of annual crop residues (when removal is sustainable);
  • all cropland was managed with no-till methods;
  • 55 million acres of cropland, idle cropland, and cropland pasture were dedicated to the production of perennial bioenergy crops;
  • all manure in excess of that which can applied on-farm for soil improvement under anticipated EPA restrictions was used for biofuel; and
  • all other available residues were utilized." (Page 2 of Executive Summary)

Source:
www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf

Are these assumptions reasonable? Are they likely? What would the probability be that all of these assumptions would be shown, over time, to be valid? Maybe a better approach would be for the authors to say, "In order to achieve Level X, we would have to do Activities Y, with a probability of Y leading to X determined to be ____ %. If we do less than Activities Y, such as Activities Y2, the results would not be at Level X, but would likely be at the lesser Level X2, with the same probability."


Additional Resources

There are many additional resources online and in libraries. For example, there is a neat JavaScript Learning Object on "Forecasting by Smoothing Techniques from Professor Hossein Arsham at:

http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/ForecaSmo.htm

Try it out.


Examples in Technology Assessment

Nearly every typical technology assessment project makes use of some type of forecasting technique, and while trend extrapolation may not always be the technique chosen, it offers powerful tools.

Examples of Trend Analysis
in Technology Assessment

www.princeton.edu/~ota/disk2/1985/8507/850705.PDF

www.princeton.edu/~ota/disk2/1990/9045/904509.PDF

www.princeton.edu/~ota/disk3/1979/7915/7915.PDF
(Which has a historic trend analysis in Appendix 2.1)


References

Levin, R., Rubin, D., & Stinson, J. (1986). Chapter 3: Forecasting. in Quantitative approaches to management. NY: McGraw-Hill. (Available to those in the BSU community at
www.bsu.edu/libraries/protected/ereserves/FlowersJ/ITEDU510/082700-10ET.pdf

Makridakis, S., Wheelwright, S., & Hyndman, R. (1998). Forecasting: Methods and Applications.. 3rd ed. New York: Wiley & Sons.

United Nations. (1998). Long-range World Population Projections: Based on the 1998 Revision. Executive Summary: Author. Retrieved April 7, 2009 from http://www.un.org/esa/population/publications/longrange/longrangeExecSum.pdf

United Nations. (1999). The World at Six Billion. New York: Author. Retrieved April 7, 2009 from http://www.un.org/esa/population/publications/sixbillion/sixbillion.htm

United States Department of Energy. (2009). CFL Market Profile - March 2009. Washington, DC: Author. Retrieved April 7, 2009 from http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf



"Forecasting with Trend Lines using Microsoft Excel™"
All information is subject to change without notification.
© Jim Flowers
Department of Technology, Ball State University