# ANOVA, Regression Analysis and Correlation Hypothesis Test

Question 1

[Refer to the file Q1.xls for the data]

a) Test the null hypothesis that six samples of word counts for males (columns 1, 3, 5, 7, 9, 11)

are from populations with the same mean. Print the results and write a brief summary of

your calculations

b) Test the null hypothesis that the six samples of word counts for females (columns 2, 4, 6, 8,

10, 12) are from populations with the same mean. Print the results and write a brief

summary of your conclusions

c) If we want to compare the number of words spoken by men to the number of words spoken

by women, does it make sense to combine the six columns of word counts for males and

combine the six columns of word counts for females, then compare the two samples? Why

and why not?

Question 2

[Refer to the file Q2.xls for the data]

a) Using the paired data consisting of the proportions of wins and the numbers of runs

scored, find the linear correlation coefficient r and determine whether there is sufficient

evidence to support a claim of linear correlation between those two variables. Then find

the regression equation with the response variable y representing the proportions of wins

and the predictor variable x representing the numbers of runs scored.

b) Using the paired data consisting of the proportions of wins and the numbers of runs

allowed, find the linear correlation coefficient r and determine whether there is sufficient

evidence to support a claim of a linear correlation between those two variables. Then, find

the regression equation with the response variable y representing the proportions of wins

and the predictor variable x representing the numbers of runs allowed.

c) Use the paired data consisting of the proportions of wins and these differences: (Runs

scored) ‐ (runs allowed). Find the linear correlation coefficient r and determine whether

there is sufficient evidence to support a claim of a linear correlation between those two

variables. Then find the regression equation with the response variable y representing the

proportions of wins and the predictor variable x representing the differences of (runs

scored)‐ (runs allowed).

d) Compare the preceding results. Which appears to be more effective for winning baseball

games: a strong defense or a strong offense? Explain.

e) Find the regression equation with the response variable y representing the winning

percentage and the two predictor variables of runs scored and runs allowed. Does that

equation appear to be useful for predicting a team's proportion of wins based on the

number of runs scored and the number of runs allowed? Explain.

f) Using the paired data consisting of the numbers of runs scored and the numbers of runs

allowed, find the linear correlation coefficient r and determine whether there is sufficient

evidence to support a claim of a linear correlation between those two variables. What does

the result suggest about the offensive strengths and the defensive strengths of the

different teams?

#### Solution Summary

The solution provides step-by-step method of performing ANOVA, Regression Analysis and Correlation Hypothesis Test. All the steps of hypothesis testing (formulation of null and alternate hypotheses, selection of significance level, choosing the appropriate test-statistic, decision rule, calculation of test-statistic and conclusion) have been explained in details. A separate Excel sheet showing the ANOVA and Regression Analysis also been included.