# Data Analysis problems

** Please see the attached file for the complete problem description **

The data included for this assignment is the Major League Baseball (baseball.xls) data set.

These hitter data were collected on 6-23-2004 and represent the top 50 hitters in Major League Baseball on that date.

The data for the pitchers represent the top 50 pitchers in both leagues when selected on 5-6-2005.

Perform the following analyses on the data, interpret your findings and summarize what each analysis means.

1. Chi square analysis - Do the top 50 hitters come predominantly from one or several divisions? The Yankees and Red Sox have the highest payrolls in baseball, so we would assume that the top hitters would be from the American League East, where both of these teams reside.

Arrange the data in a 2x3 table, with the two leagues, National and American, as rows, and three divsions, West, Central, and East as columns.

The data in the table should be a count of the regions that each of the top 50 players is from. Test the hypothesis that the hitters do not come evenly from the 6 regions.

State the critical value that you will test your Chi observed against; interpret your Chi observed. Interpret your findings.

2. Goodness of Fit Analysis - Is there a difference in how pitchers develop ERAs? Earned Run Average is a metric that is used to gauge how well pitchers are doing.

ERA is an average of the earned runs per innings pitched, a metric of runs that are batted in against that pitcher not counting errors.

Divide the ERA numbers into the following groups: less than 2, 2-3, 3-4, 4-5, 5-6 so that you have a single row of with 5 columns, it is not necessary to break this down by league

or division. The numbers in each of the cells should represent the count in each of the categories.

Is there evidence to suggest that the ERAs are not evenly distributed between 0 and 6? Is this a significant difference? What does this mean practically?

3. ANOVA 1 - If you are a top 50 hitter, this usually indicates that you get up to bat often. With the designated hitter rule in the American League, you would expect hitters

to bat even more often that their National League counterparts, where the pitcher hits as well. Test the hypothesis that the number of at-bats is different

for the six divisions. Break up the data into the six different groups (ALE, ALC, ALW, NLE, NLC, NLW) and then run the one-way ANOVA on these data.

Start by sorting the at-bat data by the League first and then Region second, which will give you six groups for the data analysis. Put these data into six columns

before running the data analysis. Note that the sample sizes for each of the six groups should match the data that you provided for the chi square test above.

· Is the analysis significant? What does this mean?

· If the test is significant, run the Tukey analysis and interpret.

4. ANOVA 2 - Use the Pitcher worksheet for the second analysis. Since It was still early in the year when the data were collected and there had been bad weather in the East

coast, do a statistical analysis to see which of the three regions (West, Central, or East) has the lowest ERA. Sort the pitching data by region before you do the analysis, there is no need to break these out by League.

· Is the analysis significant? What does this mean?

· If the test is significant, run the Tukey analysis and interpret.

Thanks

© BrainMass Inc. brainmass.com October 9, 2019, 10:53 pm ad1c9bdddfhttps://brainmass.com/statistics/analysis-of-variance/data-analysis-problems-235533

#### Solution Summary

This solution provides a step by step method for statistical data analysis. Formula for the calculation and interpretations of the results are also included.