Explore BrainMass

Dummy variable Regression

15) Consider the Midcity Pricing Structure case we discussed. We would like to study how the selling price of properties varies by neighborhood (recall that there were three neighborhoods). As such, the output from doing an ANOVA on the data set with respect to neighborhood yields the following:

ANOVA Summary
Total Sample Size 128
Grand Mean 130427.34
Pooled Std Dev 17863.20
Pooled Variance 319093956.08
Number of Samples 3
Confidence Level 95.00%

Price (1) Price (2) Price (3)
ANOVA Sample Stats Data Set #1 Data Set #1 Data Set #1
Sample Size 44 45 39
Sample Mean 110154.55 125231.11 159294.87
Sample Std Dev 15973.88 17866.05 19781.73
Sample Variance 255164862.58 319195828.28 391316815.11
Pooling Weight 0.3440 0.3520 0.3040

Sum of Degrees of Mean F-Ratio p-Value
OneWay ANOVA Table Squares Freedom Squares
Between Variation 51798469787.16 2 25899234893.58 81.16 < 0.0001
Within Variation 39886744509.71 125 319093956.08
Total Variation 91685214296.88 127

a) State explicitly (both using notation and words) the hypothesis that is being tested here, what the corresponding conclusion is, and why?

Now consider a regression model with Neighborhood as an independent variable and Price as dependent variable. Clearly, you will need to add dummy variables for the three neighborhoods. Assuming that the reference level that you use for the dummy variables is Neighborhood 2, the regression equation will be of the form

Price = b0 + b1 * Neighborhood1 + b2 * Neighborhood3,

Where b0, b1, b2 are the regression coefficients, and Neighborhood1 is 1 for a property in neighborhood 1, 0 otherwise; and similarly, Neighborhood3 is 1 for a property in neighborhood 3, 0 otherwise.

b) What will be the estimates of the regression coefficients based on the ANOVA output shown above? Clearly explain the meaning of the coefficients. (Please note that if you answer this question by just running a new regression model, you are not answering the question the way I expect you to!)

c) What will the R-squared value of the regression model, again based on the ANOVA output shown above?

Solution Summary

Step by step method for computing regression model with dummy variables