Refer to the Baseball 2008 data, which report information on the 30 major league baseball teams for the 2008 season. Let the number of games won be the dependent variable and the following variables be independent variables: team batting average, number of stolen bases, number of errors committed, team ERA, number of the home runs, and whether the teams home field is natural grass or artificial turf.
A. Write a regression equation. Discuss each of the variables. For example if you are surprised that the regression coefficient for the ERA is negative? How many wins does playing in a natural grass for a home field add to or subtract from the total wins for the season?
B. Determine the R^2 value.
C. Develop the correlation matrix. Which dependent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicollinearity?
D. Conduct a global test of the set of independent variables. Interpret.
E. Conduct a test of the hypothesis on each of the independent variables. Would you consider deleting any of the variables?
F. Rerun the analysis until only significant net regression coefficients remain in the analysis. Identify these.
G. Develop a histogram of the residuals from the final regression developed in part f. Is it reasonable to conclude that the normality assumption has been met?
H. Plot the residuals against the the fitted values of Y. Plot the residuals on the vertical axis and the fitted ones on the horizontal axis.
The solution provides step-by-step method of performing a Regression Analysis and a Correlation Hypothesis Test in EXCEL. All the steps of hypothesis testing (formulation of null and alternate hypotheses, selection of significance level, choosing the appropriate test-statistic, decision rule, calculation of test-statistic and conclusion) have been explained and the Regression Analysis has been shown in details.