## Regression analysis

The first step in regression analysis is to plot the relationships between the dependent and the independent variables on a graph: this is referred to as the scatterplot (as shown in Figures 10.3 to 10.6). This will give some idea of the general kinds of relationship that exist between the data. Once it has been established that a linear relationship exists, it is appropriate to derive values for the equation for fitting a straight line to two variables: Y = a + b(X). Regression analysis fits a line through the data points that minimises the sum of the squared deviations. In order to position the line, two values have to be determined: a, the intercept, and b, the regression coefficient.

Suppose we have the following data about houses for sale:

• area in square feet covered by building

• number of bathrooms/toilets.

200,000

100,000

200,000

100,000

o

o

o

o

o 6>o

°o

CD O

OCP°°

O

2000 Area

3000

1000

2000 Area

3000

FIGURE 10.3

Scatterplot - price versus area

200,000

200,000 100,000

Acres

100,000

Acres

FIGURE 10.4

Scatterplot - price versus acres

Which of the various factors in Table 10.8 that describe the house are most closely associated with the price?

Overall, we see that there appears to be some degree of positive association between each one of the variables and the price. The clustering of data points along a straight line (indicating the goodness of fit) is perhaps best exhibited between area and price, although acres and price also appear to be well associated. We calculate the regression equation for the relationship between one independent variable and one dependent variable by means of the following equation:

where b is the amount of change that takes place in Y for each unit of change in X.

a is the intercept or the value of Y when X is zero. It is the point where the regression line crosses the Y axis.

200,000

r 150,000

100,000

200,000

r 150,000 FIGURE 10.5

Scatterplot - price versus rooms

200,000

1 150,000

100,000

200,000

1 150,000

-

o

o

Q

o

o

9 o o o

o

Baths

FIGURE 10.6

Scatterplot - price versus bathrooms

TABLE 10.8

Houses for sale

TABLE 10.8

Houses for sale

Price

Area

Acres

Rooms

Baths

£168,500

2990

0.70

9

2

£122,000

1560

0.24

8

2

£136,000

1990

0.65

9

2

£129,000

1320

0.68

7

1

£136,000

1990

0.70

8

2

£154,000

1980

0.45

8

2

£129,000

2300

0.34

7

2

£118,000

1150

0.32

8

1

£144,000

1400

2.00

7

3

£143,000

1700

0.60

8

2

£120,000

1230

0.50

6

1

£155,000

1990

1.00

7

2

£90,000

1550

0.40

7

1

£154,000

2700

0.60

9

3

£125,000

1400

0.55

7

1

£145,000

1850

0.38

7

2

£146,500

1750

0.72

8

2

£118,000

1000

0.40

8

1

£210,000

2950

1.20

9

3

£138,000

1900

0.57

8

Using Minitab 8.0 for Windows and SPSSPC, the following bivariate regressions are obtained on these data (note that figures may show the effects of rounding off):

where Coef = regression coefficient

Stdev = standard deviation t = t-ratio (statistic)

p = probability of occurrence s = standard error of the estimate

Fit Stdev = fit standard deviation

St resid = standard residual

1 The regression equation is:

Predictor Coef Stdev t-ratio p

Constant 79312 12691 6.25 0.000

Area 32.555 6.627 4.91 0.000

Analysis of variance

Source DF

Regression 1

Error 18

Total 19

6356522504 4741927496 11098450000

6356522504 263440416

24.13 0.000

Unusual observations

Obs Area Price Fit Stdev Fit Residual St Resid 13 1550 90000 129772 4091 -39772 -2.53R 19 2950 210000 175349 8233 34651 2.48R

R denotes an observation with a large standard residual.

2 The regression equation is:

Price = 119815 + 29593 acres Predictor Coef Stdev t-ratio p Constant 119815 9623 12.45 0.000 Acres 29593 12767 2.32 0.032

Analysis of variance

Source DF

Regression 1

Error 18

Total 19

2551202835 8547247165 11098450000

2551202835 5.37 0.032 474847065

 Obs Acres Price Fit Stdev Fit Residual St Resid 9 2.00 144000 179000 17911 -35000 -2.82RX 19 1.20 210000 155326 8547 54674 2.73R

R denotes an observation with a large standard residual. X denotes an observation whose X value gives it large influence.

R denotes an observation with a large standard residual. X denotes an observation whose X value gives it large influence.

3 The regression equation is:

Price = 22518 + 15036 rooms Predictor Coef Stdev t-ratio p Constant 22518 44284 0.51 0.617 Rooms 15036 5682 2.65 0.016 s = 21068 R-sq = 28.0% R-sq(adj) = 24.0%

Analysis of variance

Source DF

Regression 1

Error 18

Total 19

3108768182 7989681818 11098450000

3108768182 7.00 0.016 443871212

Unusual observations

Obs Rooms Price Fit Stdev Fit Residual St Resid

19 9.00 210000 157845 8523 52155 2.71R R denotes an observation with a large standard residual.

4 The regression equation is:

Price = 94793 + 24587 baths Predictor Coef Stdev t-ratio p Constant 94793 11123 8.52 0.000 Baths 24587 5782 4.25 0.000 Unusual observations

Obs Baths Price Fit Stdev Fit Residual St Resid 19 3.00 210000 168554 7970 41446 2.65R R denotes an observation with a large standard residual.

As predicted by the scattergraph the best relationship is that between area and price.

Knowledge of an independent variable, coupled with the appropriate equation, enables an estimate to be made of the dependent variable. Thus knowing the area covered by a house can be used to estimate its likely price. ## Online Survey Champion

There are people all over the world trying to find ways to make money online. From stay at home moms looking to make a few extra dollars to college students and entrepreneurs, the allure of making your own hours and working from home or from the local coffee shop is very appealing.

Get My Free Ebook