# Chi-square test and ANOVA

The data included for this assignment is the Major League Baseball data set. These data were collected on 23 June 2004 and represent the top 50 hitters in Major League Baseball. Use Excel to analyze the data.

See attached Excel sheet of hitters and pitchers.

1. Chi square analysis - Do the top 50 hitters come predominantly from one or several divisions? The Yankees and Red Sox have the highest payrolls in baseball, so we would assume that the top hitters would be from the American League East, where both of these teams reside. The MLB is broken up into two divisions, National and American, and three regions within each division, West, Central, and East.

? Use the data provided to set up a 2 x 3 table with League as the rows and Region as the columns. The data in the table should be a count of the regions that each of the top 50 players is from. Test the hypothesis that the hitters do not come evenly from the 6 regions.

? State the critical value that you will test your Chi observed against; interpret your Chi observed. Interpret your findings.

The second set of data contains pitcher information. The top 50 pitchers in both leagues were selected 5/6/05. Is there a difference in how pitchers develop ERAs? The Earned Run Average is a metric that is used to gauge how well pitchers are doing. ERA is an average of the earned runs per innings pitched, a metric of runs that are batted in against that pitcher not counting errors. Divide the ERA numbers into the following groups: less than 2, 2-3, 3-4, 4-5, 5-6. Then count the numbers in each of the categories. Is there evidence to suggest that the ERAs are not evenly distributed between 0 and 6? Is this a significant difference? What does this mean practically?

2. ANOVA - If you are a top 50 hitter, this usually indicates that you get up to bat often. With the designated hitter rule in the American League, you would expect hitters to bat even more often that their National League counterparts, where the pitcher hits as well. Test the hypothesis that the number of at-bats is different for the six divisions. Break up the data into six columns and then run the one-way ANOVA on these data.

Start by sorting the at-bat data by the League first and then Region second, which will give you six groups for the data analysis. Put these data into six columns before running the data analysis. Note that the sample sizes for each of the six groups must match the data that you provided for the chi square test above.

? Is the analysis significant? What does this mean?

? If the test is significant, run the Tukey analysis and interpret.

Use the Pitcher worksheet for the second analysis. Since It is still early in the year and there has been bad weather in the East coast, do a statistical analysis to see which of the three regions (West, Central, or East) has the lowest ERA. Sort the pitching data by region first.

? Is the analysis significant? What does this mean?

? If the test is significant, run the Tukey analysis and interpret.

https://brainmass.com/statistics/chi-squared-test/chi-square-test-and-anova-90888

#### Solution Summary

The solution gives step by step procedure for calculating ANOVA, Chi-square test. Tukey HSD is also calculated for ANOVA. Null hypothesis, alternative hypothesis, critical value, p value and test statistic are given with interpretations.