Regression analysis
The first step in regression analysis is to plot the relationships between the dependent and the independent variables on a graph: this is referred to as the scatterplot (as shown in Figures 10.3 to 10.6). This will give some idea of the general kinds of relationship that exist between the data. Once it has been established that a linear relationship exists, it is appropriate to derive values for the equation for fitting a straight line to two variables: Y = a + b(X). Regression analysis fits a line through the data points that minimises the sum of the squared deviations. In order to position the line, two values have to be determined: a, the intercept, and b, the regression coefficient.
Suppose we have the following data about houses for sale:
• area in square feet covered by building
• number of bathrooms/toilets.
200,000
100,000
200,000
100,000
o  
o  
o 
o  
o 6>o  
°o  
CD O  
OCP°°  
O 
2000 Area 3000 1000 2000 Area 3000 FIGURE 10.3 Scatterplot  price versus area 200,000 200,000 100,000 Acres 100,000 Acres FIGURE 10.4 Scatterplot  price versus acres Which of the various factors in Table 10.8 that describe the house are most closely associated with the price? Overall, we see that there appears to be some degree of positive association between each one of the variables and the price. The clustering of data points along a straight line (indicating the goodness of fit) is perhaps best exhibited between area and price, although acres and price also appear to be well associated. We calculate the regression equation for the relationship between one independent variable and one dependent variable by means of the following equation: where b is the amount of change that takes place in Y for each unit of change in X. a is the intercept or the value of Y when X is zero. It is the point where the regression line crosses the Y axis. 200,000 r 150,000 100,000 200,000 r 150,000 FIGURE 10.5 Scatterplot  price versus rooms 200,000 1 150,000 100,000 200,000 1 150,000

Post a comment