Forecasting
Yogi Bera is credited with saying, "It's tough to make
predictions, especially about the future." The
humor here is that forecasting implies the application of a
variety of tools to predict future situations, or results. "Fore" implies a forward
predication, though it is possible to use data from the 1920s to forecast for
the 1930s, comparing our forecasts to the actual data from the 1930s. If we move
backward, attempting to predict a previous condition, the term
backcasting
can be used, though the word, "predict," is misleading here, and should instead
be "suggest." If we know existing time series data points, but there are some
holes, we can estimate the values for those missing points between known points;
that process is called interpolation. One way to classify
types of forecasts are based on the type of data. quantitative
forecasting is based on numeric data from several different time periods of the
past that can be assumed to provide a pattern that allows us to predict the
future; whereas qualitative forecasting relies more heavily on
nonnumeric data, the judgments of specialists, and their variety of knowledge,
according to Makridakis, Wheelwright, & Hyndman (1998). They suggest the
following general steps in quantitative forecasting:
 Defining the problem
 Acquiring relevant data and judgments of key individuals
 Initial graphing and analysis
 Selecting and adapting a model to fit the data
 Applying that model to forecast future data, and evaluating that model
TimeSeries Data and Trends
As noted, some data is particular helpful in quantitative forecasting. This
is timeseries data, where we know the numerical value for several
different points in time. Typically, time is plotted along the horizontal xaxis
of a graph, and the variable measured or to be predicted is plotted along the
vertical yaxis. The term trend implies a
pattern of change over time, and if we suspect a trend exists, we may be able to
extrapolate that trend to future time periods, forecasting data for those
periods.
For example, sales at an ice cream stand
at the town park in June of each of the last five years has been good, but
in July it was about 20% more than in June. If this year, the stand took in
$10,000 in June (a new record), how much would you predict it will
take in in July? Well, if we were correct in our assumption based on
the historical data, we'd estimate the July figure would be 20% higher, or
$12,000.
Sometimes, the data does not suggest a trend, or the trend
is so weak that it might as well not exist, as in the following:
Figure 1. Random timeseries data not suggesting a strong
trend.
Other times, however, there does seem to be a pattern to
timeseries data, and in these cases, we may be able to use trend extrapolation
to forecast future data. First, let's look at some different classifications of
trends.
Some Common Types
of Trends
Trends are often shown graphically (as
line graphs) with the level of a dependent variable on the yaxis and the
time period on the xaxis. There are different types of trends, including the
following:

constant

linear

exponential

damped

polynomial
The following graphs each contain 100 points of fictitious
data connected with a blue line, and the trend superimposed with a black line.
They are based on Figure 31 on Page 112 from Levin, Rubin, & Stinson (1986).
Constant trends are those where there is no net
increase or decrease.
Figure 2. A constant trend that does not rise or fall.
However, there may be seasonality, or a periodic
fluctuation (as there would be in a graph of the temperature over a 1year
period, with daytime temperatures higher than nighttime):
Figure 3. A constant trend that fluctuates regularly.
The above graph shows the same data with
periodic additions and subtractions. The blow graph shows this data, but the
periods are based on a multiple of the data in the xaxis. Often, it
is instead a multiplicative factor of the yaxis that is used.
Figure 4. A constant trend that fluctuates regularly, and
the degree of that fluctuate is a factor of the yvalue, or in this case,
the xvalue.
Linear trends show a steady,
straightline increase or decrease. So the trendline may go up or down, and
the angle may be steep or shallow.
Figure 5. A linear trend showing a steady increase.
Figure 6. A linear trend with additive seasonality.
Figure 7. A linear trend
where the periodic fluctuation is a factor of the yvalue.
Exponential trends are those where the data rises or
falls not at a steady rate, but at an increasing rate. The xvalue
(plotted horizontally) is an exponent of the trendline formula to derive the
yvalue.
Figure 8. An upward exponential trend, where the x value
is an exponent in the equation to derive y.
Figure 9. An upward exponential trend with additive
seasonality.
Figure 10. An upward exponential trend with multiplicative
seasonality.
Damped trends are those that approach a horizontal
asymptote:
Figure 11. A damped trend.
Polynomial trends are those best
modeled by a polynomial equation. They may be secondorder (quadratic)
equations of the form y = ax^{2} + bx + c,
resulting in a parabolic shape:
Figure 12. A secondorder polynomial trend.
Figure 13. Another secondorder polynomial trend.
Polynomial trendlines may also be third order (y
= ax^{3} + bx^{2} + c) or higher:
Figure 14. A thirdorder polynomial trend.
Figure 15. A thirdorder polynomial trend with additive
seasonality.
Figure 16. A thirdorder polynomial trend where
seasonality is a factor of the y value.
Figure 17. Another thirdorder polynomial trend.
Quantitative Forecasting using Trend
Extrapolation
There are several tools available for using trend
extrapolation to first plot a trendline to historical timeseries data, and then
extend this to future periods for the purpose of forecasting or predicting
values for those periods. Some might be tempted to visually extend a trendline
to future periods with a pencil on a printed graph, but algebraic techniques are
more precise, more varied, and more powerful. There
are three general approaches to use algebraic techniques for trendline
extrapolation. The first of these involves applying formulas to calculate a
future period. The second technique is to make use of specialized functions
within a spreadsheet program or another data analysis program. The third
technique is to use a spreadsheet or data analysis program to construct a graph
with a trendline, and to automatically extend that trendline to future periods.
Using Formulas
Based on Chapter 3 of Levin, Rubin, & Stinson,
(1986), I have created a tutorial (with practice) on using formulas in Microsoft
Excel™ to perform forecasts using the following forecasting techniques:

Naïve Forecasts

Naïve Trends

Moving Averages

Weighted Moving Averages

Single Exponential Smoothing

Double Exponential Smoothing
Please download the file from one of the following and follow
the instructions (required download), checking your answers.
Notice that it is not that difficult to
type in the formulas that are needed to do the forecasting. In some
instances, such as Single Exponential Smoothing, there is a builtin feature
in the Microsoft Excel Analysis ToolPak™ AddIn.
Using Excel's
Charts
for Trend Extrapolation
Microsoft Excel™ offers some builtin
tools for forecasting. One of these allows you to add a trendline to
existing data points on a chart. However, as with templates, the
developers of these tools have made some decisions for the user, and not all
users would agree with those decisions. If the limitations imposed by
Microsoft Excel's features for forecasting are inappropriate for a
particular forecasting task, the reader is instead encouraged to use direct
numerical manipulation using proven analytical techniques as described in
any of several texts on forecasting (such as Makridakis, Wheelwright & Hyndman,
1998).
Before You Begin
This remainder of this page is based on
the assumption that the reader has Microsoft
Excel™ 2007 with the Analysis ToolPak AddIn from Microsoft installed.
Getting the Data
Let's look at some data concerning
compact fluorescent lamps (CFLs) using the following as a source document:
US Department of Energy. (2009).
CFL Market Profile  March 2009. Washington, DC: Author. Retrieved
April 7, 2009 from
http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf
The analysis in this report was performed
by D & R International, LTD (http://www.drintl.com/.)
On Page 2, there is a bar chart (or bar
graph) that
lists the number of shipments of CFLs by year up to 2007, and then predicts
the number of shipments in 2008, 2009, and 2010, based on that data.
Figure 18. Bar chart from US DOE, 2009, p.2.
Let's use the data in this chart, and the
power of Microsoft Excel™, to make a similar prediction. Ideally, you would
have the actual data values, but in this case, an estimate was made based on
the above graphic and the following was entered into an Excel spreadsheet.

Year 
Incand. 
CFLs 

2000 
1686 
23 

2001 
1659 
71 

2002 
1681 
54 

2003 
1668 
69 

2004 
1649 
91 

2005 
1641 
104 

2006 
1560 
188 

2007 
1325 
400 
Est: 
2008 
1319 
328 
Est: 
2009 
1238 
360 
Est: 
2010 
1148 
395 
Table 1. Raw data.
Let's only look at the historical data for 2000 to 2007,
not at the estimates or predictions for 2008 to 2010. We can recreate
the bar chart shown in the source document by selecting the historical
data in Excel and creating a bar chart:
Figure
19. Raw data in a bar chart to match original
But instead, let's create a scatter plot of the values
(since Excel's trendline equation feature can produce errors with bar
charts or line graphs.)
Figure
20. Raw data in a scatter plot.
Adding a
Linear Trendline and
Regression Equation
Now, remember, we are just concerned with the CFL data,
and we want to be able to predict future years. In order to add a
trendline, click on one of the icons representing a data point for CFLs,
and then rightclick and select "Add Trendline." You'll see the
following dialog box.
Figure 21. Trendline options dialog box.
In this example, we will assume that the number of CFLs
shipped per year increases at a steady or linear rate. For now, in the Trendline Options area, select the
following

Trend / Regression type: Linear

Forecast  Forward 3 periods

Display
Equation on chart
After moving the equation we have:
Figure 22. Raw data with a linear trendline and regression equation.
The equation is a linear regression equation.
That means that is it the equation of a straight line that best fits the
points on the chart. The method Excel uses to determine this equations involves finding
the line that produces the least value for the sum of the squares of the
vertical differences between data points and the line. Like all lines,
it has an equation in the form:

y is the number to be
calculated, the dependent
variable, or in this case, the number of millions of CFLs shipped per
year;

m is the slope of the line, which equals the change in
the y value divided by the change in the x value;

x is the given data point or the dependent
variable, in this case, it is the year; and

b is the yaxis intercept of the line.
The equation, in this case, is:
That means that for the year 2010, the predicted value
is
y = (40.429 * 2010) + 80874
y = 388 million CFLs shipped
We can substitute other values for x, such as the year
2020, and since we now have an equation, we can predict that there will
be 793 million CFLs shipped in the year 2020. Of course, this is making
a lot of assumptions that we shouldn't make. In particular, we are
assuming that the trend is linear, and that it will continue far into
the future.
Alternate
Method. You can find out the equation directly from the
tabled data, if you like. Select two cells like G5 and G6 and then start
typing in the formula: =LINEST(range) for the range, select all
of the known y values, then type the closing parenthesis, but don't hit
the Enter key. Instead, hit ControlShiftEnter. You'll see the slope
and the intercept appear in these two cells.
Many trends are not linear. For example, human
population on the planet was fairly linear, but then it shot up, as
illustrated by the red line in the following figure:
There are several nonlinear predictive
equations. We'll look at two, exponential equations and polynomial
equations, but you are advised to explore others.
Exponential
Let's take that same historical CFL shipment data we used
above and apply some nonlinear trendlines. Here is an exponential trendline.
It uses an equation that has the x value (the year) as an exponent. I
clicked on the new equation an selected "format trendline label" to display
the equation in scientific notation with six decimal points, since the
default does not give me enough precision for predicting.
Figure 24. Raw data with exponential trendline.
As we can see, there trendline is curved, not quite as much
as is indicated by the relatively high 2007 datum point, but it is still
curved up.
The predictive equation is:
y = 1.598767 E 279 e ^ 3.226616 E 01 x
Recall that the capital E means "Times ten to the power of"
and that the lower case e is a constant approximately equal to 2.71828. In
Excel, I can then type in the following formula in any cell:
=1.598767E279*EXP(0.3226616*2010)
and by replacing "2010" with the year, get a prediction for
that year. The value for 2010 is 733 million CFLs, and the value for 2012 is
1.398 billion CFLs.
Polynomial
The predictive equation can be a polynomial. We saw that the
linear regression equation was
y = mx + b
A secondorder, or quadratic, polynomial equation adds an x^{2}
term, resulting in:
y = ax^{2} + bx + c
The graph of a quadratic equation of this form is typically
a parabola. Here is the same data with a second order polynomial trendline:
Figure 25. Second order polynomial trend line with equation.
It is possible to increase the order, adding an x^{3},
x^{4}, or x^{5} term, if there is reason to believe such a
curve will be more accurate.
Modifying the Data
Sometimes, we suspect the data should be modified. In our
example, notice how high the value of 400 was for 2007. An analyst might
have reason to believe that this point was an outlier, and due to some
special circumstances, like a onetime marketing blitz, the high value of
this datum is throwing off the future prediction. Let's alter the data,
reducing that point to 300.

Year 
Incand. 
CFLs 

2000 
1686 
23 

2001 
1659 
71 

2002 
1681 
54 

2003 
1668 
69 

2004 
1649 
91 

2005 
1641 
104 

2006 
1560 
188 

2007 
1325 
300 
Est: 
2008 
1319 
328 
Est: 
2009 
1238 
360 
Est: 
2010 
1148 
395 
Table 2. Revised data.
Using the revised data and secondorder polynomial
forecasting, we get:
Figure 26. The value of 2007 was changed from 400 to 300 in
the belief that this was an abnormal value.
Notice how Figure 10 is relatively close to the initial
prediction shown in the original US DOE source document.
There are many ways to transform and
adjust data, and in each instance the analyst should have a defensible line
of reasoning that justifies the transformation.
Exercising
Care
As with many forms of statistical
analysis, trendline extrapolation can be subject to deliberate attempts to
make the data suggest the analyst's bias. This is inappropriate. Where there
are alternative projections, it is best to present them with explanations of
each. For example, the following illustration shows several different paths
world population might take given different conditions explained by the
authors.
Figure 27. "World population according to five projection
scenarios, 1950  2150" from United Nations,
1998, p. 4.
Other Quantitative Methods
As seen on Excel's Trendline Options
dialog box, there are other types of trendlines that can be added, including
a logarithmic, power, and moving average trendline. The Analysis ToolPak
Addin for Excel also has several forecasting tools. To access them, click
Data Analysis in the Data tab. You will see moving average, regression, and
exponential smoothing there, all of which can be used to forecast.
But don't stop there, Excel, like some
other programs for numerical manipulation, allows the user to directly
control the formulas used to derive values. We do not have to settle for the
default settings used in the Add Trendline feature of charts, but we can
instead perform the necessary calculations on the data directly.
For information on the methods in this
lesson, and others, such as the BoxJenkins method, dynamic regression,
multiple regression, please consult a text on forecasting, such as the one
by Makridakis, Wheelwright, & Hyndman
(1998).
Forecasting Exercise / Example
There is another spreadsheet that contains an example of using Microsoft
Excel's forecasting functions to predict future values. Using data on the number
of compact fluorescent lamps (CFLs) shipped, and on the recycling rate of CFLs
and their mercury content, this spreadsheet guides you through the process of
predicting the cumulative environmental discharge of toxic mercury from CFLs
that are not recycled (required download):.
Evaluating Trend Analysis &
Assumptions
Coefficient of Determination
If you apply a trendline to a graph using Excel, you
may notice that you can have both the trendline equation displayed, and the
value for r^{2}. r^{2} is called the
coefficient of determination, and it is a measure of the proportion
of the variability that is explained by the trendline equation. If r^{2}
= .74, then 74% of the fluctuation of the data corresponds to the trendline,
and 26% of the fluctuation deviates from what would have been predicted by
the trendline. Thus, an intermediary step in the evaluation of models for
trendline analysis would be to look at the value of the coefficient of
determination, but do not let this be the only criterion.
For example, consider the following sample data
Time 
Data 
1 
20 
2 
80 
3 
50 
4 
70 
5 
100 
6 
110 
7 
190 
8 
300 
9 
? 
10 
? 
Table 3. More raw data.
We could map a linear regress line to the historical data (Time = 1 to
8), but when we use Excel to map a 3rd order polynomial trendline, we find
the value for r2 is much higher, at R2 = 0.9787:
Figure 28. Finding a trend that fits: 3rd order polynomial?
This means that the trendline curve accounts for 97.87% of the variation
among the data points. That's pretty good. But we're not convinced it is
optimal, so we keep clicking options in Excel, finding that when we specify
a 6th order polynomial trendline, it accounts for 99.96% of the variation.
Figure 29. Finding a trend that fits: 6th order polynomial?
We are very impressed, and we decide to use this 6th order trendline to
predict future periods. Imagine our surprise when we extend the trendline
two periods to the future to result in the following:
Figure 30. That 6th order polynomial had a high coefficient of determination,
but was a poor predictor.
That can't be right. So we resolve to go back to the 3rdorder polynomial
and its predications:
Figure 31. A more reasonable prediction with the 3rd order
polynomial trendline.
In all, one of the best ways to test predications is
to, well, wait. Testing a predication against the data that starts to come
in has no equal, but it does require waiting, and if we were willing to
wait, there would have been no reason to make a prediction in the first
place. Thus, it is wise to attempt to assess the strength of a predication
prior to the emergence of new data.
Assumptions Trend extrapolation is one aspect of
the larger field of trend (or trendline) analysis. It attempts to extend
known data points to regions beyond the timeframe of known datapoints,
almost always in an attempt to predict future values with some degree of
probability. However, the assumptions made are critical.
For example, let's look at "Biomass as feedstock for a Bioenergy and
Bioproducts Industry: The Technical Feasibility of a BillionTon Annual
Supply," which was published in April 2005 under the auspices of the US
Department of Agriculture and the US Department of Energy (required visit):
www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf
Please look at the first page of the Executive Summary,
where the purpose is stated: "The purpose of
this report is to determine whether the land resources of the United States
are capable of producing a sustainable supply of biomass sufficient to
displace 30 percent or more of the country's present petroleum
consumption.... The short answer to the question of whether that much
biomass feedstock can be produced is yes."
That
is worth repeating, "The short answer ... is yes." However, now look at the
assumptions under which this answer was made.
"Important assumptions that were
made include the
following:
 yields of corn, wheat, and other
small grains were increased by 50 percent;
 the residuetograin ratio for
soybeans was increased to 2:1;
 harvest technology was capable
of recovering 75 percent of annual crop residues (when removal
is sustainable);
 all cropland was managed with
notill methods;
 55 million acres of cropland,
idle cropland, and cropland pasture were dedicated to the
production of perennial bioenergy crops;
 all manure in excess of that
which can applied onfarm for soil improvement under anticipated
EPA restrictions was used for biofuel; and
 all other available residues
were utilized." (Page 2 of Executive Summary)
Source:
www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf 
Are these assumptions reasonable? Are they likely?
What would the probability be that all of these assumptions would be shown,
over time, to be valid? Maybe a better approach would be for the authors to
say, "In order to achieve Level X, we would have to do Activities Y,
with a
probability of Y leading to X determined to be ____ %. If we do less than Activities Y, such
as Activities Y_{2}, the results would not be at Level X, but would
likely be at the lesser Level X_{2,} with the same probability."
Additional Resources
There are many additional resources online and in
libraries. For example, there is a neat JavaScript Learning Object on
"Forecasting by Smoothing Techniques from Professor Hossein Arsham at:
http://home.ubalt.edu/ntsbarsh/Businessstat/otherapplets/ForecaSmo.htm
Try it out.
Examples in Technology Assessment
Nearly every typical technology assessment
project makes use of some type of forecasting technique, and while trend
extrapolation may not always be the technique chosen, it offers powerful
tools.
References
Levin, R.,
Rubin, D., & Stinson, J. (1986). Chapter 3: Forecasting. in Quantitative
approaches to management. NY: McGrawHill. (Available to those in the
BSU community at
www.bsu.edu/libraries/protected/ereserves/FlowersJ/TEDU510/08270010ET.pdf
Makridakis,
S., Wheelwright, S., & Hyndman, R. (1998). Forecasting: Methods and
Applications.. 3rd ed.
New York: Wiley & Sons.
United
Nations. (1998). Longrange World Population Projections: Based on the
1998 Revision. Executive Summary: Author. Retrieved April 7, 2009 from
http://www.un.org/esa/population/publications/longrange/longrangeExecSum.pdf
United
Nations. (1999). The World at Six Billion. New York: Author.
Retrieved April 7, 2009 from
http://www.un.org/esa/population/publications/sixbillion/sixbillion.htm
United
States Department of Energy. (2009).
CFL Market Profile  March 2009. Washington, DC: Author. Retrieved
April 7, 2009 from
http://www.energystar.gov/ia/products/downloads/CFL_Market_Profile.pdf
