Two
scenarios may warrant transformation of variables to standardized form, where Y_{i}
and X_{ik} are converted to

Y_{i}’ = _{}_{} = _{}

and X_{ik}’ = _{}_{} = _{}

where s_{Y} = _{} and _{} = _{}

Recall that _{} and _{} have zero mean and
unit variance.

The two scenarios are

1. __Minimizing rounding errors
in computations__.

Primary source of these
errors is in the computation of the inverse of the matrix [**X’X**]. If the determinant of this matrix is near
zero (caused by severe multicollinearity), then the individual elements of the
inverse will be extremely large in magnitude.

Rounding errors may also
occur when X variables are grossly different in magnitude.

Transformation of the
response and the predictor variables will convert the [**X’X**] matrix into a matrix of correlation
coefficients with all elements within ±1.

2. __Lack of comparability in
regression coefficients__.

Suppose we have a model with
two predictors

X_{1} - trees per
hectare in the range of 0 – 5000, and

X_{2} - tree
diameter with a range of 0 – 50 cm.

The regression coefficients
b_{1} and b_{2} are likely to have very different magnitude,
with the result that increase of one unit in X_{1} will have an
entirely different effect on the response than from a unit change in X_{2}.

Regression model with transformed variables,
y’ and x_{k}’, is called the standardized regression model

Y_{i}’ = b_{1}’X_{i1}^{’}
+ b_{2}^{’}X_{i2}^{’} + ^{… }^{b}^{p-1’Xip-1’} + e_{i}

Note
that there is no intercept term in this model. Why?

__Matrix
of transformed variables __

The [**X’X**] matrix and [**X’Y**] vector of the transformed predictor variables are

**r**_{XX} = _{} and **r**_{XY} = _{}

and
the vector of estimated regression coefficients is

**b**’ = [**x’x**]^{-1}[**x’y**]
= [**r**_{XX}]^{-1}[**r**_{XY}]

It can
be shown that the individual regression coefficient b_{k} of the untransformed
model can be recovered from the standardized regression coefficient b_{k}’

b_{k}
= _{}b_{k}’

and b_{0} = _{} - _{}

General form of a polynomial model of one
predictor variable

Y = b_{0}
+ b_{1}X + b_{2}X^{2} + … + b_{p-1}X^{p-1}
+ e

Even though the model form is multiple linear
there are major differences between a polynomial model and a first order
multiple regression model

1. Curvilinearity in
two-dimensional space with several peaks and valleys. A far cry from a
p-dimensional response surface for a first order multiple regression model.

2. There is strong
multicollinearity as all predictors are different powers of a single variable.

Polynomial models increase in complexity with
more than one predictor variable.

__Why use polynomial models__ ?

1. The true curvilinear
response function is indeed polynomial.

2. The curvilinear response
function is unknown (or complex) but a polynomial function is a good
approximation.

Perhaps the primary reason for using
polynomials as very often the true form of the functional relationship between
X,Y is unknown.

The main danger in using polynomial is in
extrapolation, particularly when monotonically increasing response is modeled
by a polynomial.

Polynomial regression models may contain one
or more variables, and each predictor variable may be present in various
powers.

We will consider here only models with one
and two predictor variables raised to the first and second power.

Y_{i} = b_{0}’
+ b_{1}’X_{i} + b_{2}’X_{i}^{2} +
e_{i}

In order to minimize computational
difficulties caused by multicollinearity
X_{i} is replaced by deviation x_{i}, such that

x_{i}, = X_{i}
-_{}

and the model is transformed into

Y_{i} = b_{0}
+ b_{1}x_{i} + b_{2}x_{i}^{2} + e_{i}

Slightly different notation is used in
polynomial regression to reflect pattern of exponents

Y_{i} = b_{0}
+ b_{1}x_{i} + b_{11}x_{i}^{2} + e_{i}

The original coefficients (b_{0}’,+
b_{1}’ and b_{2}’) can be recovered from the
b_{0}, b_{1}
and b_{2} using the transformation

b_{0}’ = b_{0}
– b_{1}_{} + b_{11}_{}^{2}

b_{1}’ = b_{1}
– 2b_{11}_{}

b_{2}’ = b_{2}

__Example__: Data from Douglas fir
trees is used to illustrate the use of a quadratic regression model to predict
tree height from its diameter.

Second order polynomial regression of H on (D
- _{})

The
regression equation is: _{} = 115.053 + 6.071*x - 0.180*x²

where Y = H, and x = (D -
_{}).

For
our data _{} = 14.5458

and
the height prediction model in terms of D is

_{} = 115.053 + 6.071*(D – 14.5458) - 0.180*(D –
14.5458)²

when
simplified becomes

_{} = -11.339 + 11.307*D - 0.180*D²

Fig 8.1: Scatter-plot
of diameter (D) and height (H) of 49 Douglas fir trees. The curve predicts
height using quadratic regression of H on D.

Y = b_{0}
+ b_{1}x_{1} + b_{2}x_{2} + b_{11}x_{1}^{2}
+ b_{22}x_{2}^{2} + b_{12}x_{1}x_{2}
+ e

The response
surface in this case is three dimensional, the actual form governed by the
parameter values.

Additional
variables, and higher order will make the polynomial model larger and more
complex to interpret.

Basic
rule of regression to keep the model simple also applies to polynomials.

__Implementation
of polynomial regression models__

Polynomial
regression model fitting presents no new problems and all earlier results on
fitting apply.

When
fitting a polynomial model, start with a second or third order model and then
perform tests to see if higher order terms can be dropped.

__Some
further comments on polynomial regression __

1.
Multicollinearity is unavoidable

2.
Exrapolation may lead to more serious errors than in general linear
models.

3.
Though quadratic term often provides close approximation to nonlinear
form, it has only limited flexibility.

4.
Keep the model to as low an order as possible for interpretation
simplicity.