Explore BrainMass

Correlation and Multiple Regression Analysis


A consumer data analyst collected the following data on the screen size of popular LCD televisions sold recently at a large retailer:

Manufacturer Screen Price ($)
Sharp 46 1473.00
Samsung 52 2300.00
Samsung 46 1790.00
Sony 40 1250.00
Sharp 42 1546.50
Samsung 46 1922.50
Samsung 40 1372.00
Sharp 37 1149.50
Sharp 46 2000.00
Sony 40 1444.50
Sony 52 2615.00
Samsung 32 747.50
Sharp 37 1314.50
Sharp 32 853.50
Sharp 52 2778.00
Samsung 40 1749.50
Sharp 32 1035.00
Samsung 52 2950.00
Sony 40 1908.50
Sony 52 3103.00
Sony 46 2606.00
Sony 46 2861.00
Sony 52 3434.00

a. Does there appear to be a linear relationship between the screen size and the price?
b. Which variable is the 'dependent' variable?
c. Using statistical software determine the regression equation. Interpret the value of the slow in regression equation.
d. Include the manufacturer in a multiple linear regression analysis using a 'dummy' variable. Does it appear that some manufacturers can command in a premium price? Hint: You will need to use a set of indicator variables.
e. Test each of the individual coefficients to see if they are significant.
f. Make a plot of the residuals and comment on whether they appear to follow a normal distribution
g. Plot the residuals versus the fitted values. Do they seem to have the same amount of variation?


Refer to the Baseball 2008 data, which report information on the 30 Major League Baseball teams for the 2008 season. Let the number of games won be the dependent variable and the following variables be independent variables: team batting average, number of stolen bases, numbers of errors committed, team ERA, number of home runs, and whether the teams home field is natural grass or artificial turf.

Go to: Datasets. Unzip the file and look for the Baseball file.

a. Write out the regression equation. Discuss each of the variables. For example: Are you surprised that the regression coefficient foe ERA is negative? How many winds does playing on natural grass for a home field add to or subtract from the total wins for the season?
b. Determine the value of R2. Interpret.
c. Develop a correlation matrix. Which independent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicollinearity?
d. Conduct a global test on the set of independent variables. Interpret.
e. Conduct a test of hypothesis on each of the independent variables. Would you consider deleting any of the variables? If so, which ones?
f. Rerun the analysis until only significant net regression coefficients remain in the analysis. Identify these variables.
g. Develop a histogram of the residuals from the final regression equation developed in part (f). Is it reasonable to conclude that the normality assumption has been met?
h. Plot the residuals against the fitted values from the final regression equation developed in part (f) against the fitted values of Y. Plot the residuals on the vertical axis and the fitted values on the horizontal axis.

Solution Summary

The solution is comprised of detailed Multiple Regression Analysis in EXCEL and provides students with a clear perspective of the underlying statistical aspects.