Purchase Solution

Data Analysis problems

Not what you're looking for?

Ask Custom Question

** Please see the attached file for the complete problem description **

The data included for this assignment is the Major League Baseball (baseball.xls) data set.
These hitter data were collected on 6-23-2004 and represent the top 50 hitters in Major League Baseball on that date.
The data for the pitchers represent the top 50 pitchers in both leagues when selected on 5-6-2005.
Perform the following analyses on the data, interpret your findings and summarize what each analysis means.

1. Chi square analysis - Do the top 50 hitters come predominantly from one or several divisions? The Yankees and Red Sox have the highest payrolls in baseball, so we would assume that the top hitters would be from the American League East, where both of these teams reside.
Arrange the data in a 2x3 table, with the two leagues, National and American, as rows, and three divsions, West, Central, and East as columns.
The data in the table should be a count of the regions that each of the top 50 players is from. Test the hypothesis that the hitters do not come evenly from the 6 regions.
State the critical value that you will test your Chi observed against; interpret your Chi observed. Interpret your findings.

2. Goodness of Fit Analysis - Is there a difference in how pitchers develop ERAs? Earned Run Average is a metric that is used to gauge how well pitchers are doing.
ERA is an average of the earned runs per innings pitched, a metric of runs that are batted in against that pitcher not counting errors.
Divide the ERA numbers into the following groups: less than 2, 2-3, 3-4, 4-5, 5-6 so that you have a single row of with 5 columns, it is not necessary to break this down by league
or division. The numbers in each of the cells should represent the count in each of the categories.
Is there evidence to suggest that the ERAs are not evenly distributed between 0 and 6? Is this a significant difference? What does this mean practically?

3. ANOVA 1 - If you are a top 50 hitter, this usually indicates that you get up to bat often. With the designated hitter rule in the American League, you would expect hitters
to bat even more often that their National League counterparts, where the pitcher hits as well. Test the hypothesis that the number of at-bats is different
for the six divisions. Break up the data into the six different groups (ALE, ALC, ALW, NLE, NLC, NLW) and then run the one-way ANOVA on these data.

Start by sorting the at-bat data by the League first and then Region second, which will give you six groups for the data analysis. Put these data into six columns
before running the data analysis. Note that the sample sizes for each of the six groups should match the data that you provided for the chi square test above.
·         Is the analysis significant? What does this mean?
·         If the test is significant, run the Tukey analysis and interpret.

4. ANOVA 2 - Use the Pitcher worksheet for the second analysis. Since It was still early in the year when the data were collected and there had been bad weather in the East
coast, do a statistical analysis to see which of the three regions (West, Central, or East) has the lowest ERA. Sort the pitching data by region before you do the analysis, there is no need to break these out by League.
·         Is the analysis significant? What does this mean?
·         If the test is significant, run the Tukey analysis and interpret.

Thanks

Purchase this Solution

Solution Summary

This solution provides a step by step method for statistical data analysis. Formula for the calculation and interpretations of the results are also included.

Purchase this Solution


Free BrainMass Quizzes
Terms and Definitions for Statistics

This quiz covers basic terms and definitions of statistics.

Measures of Central Tendency

This quiz evaluates the students understanding of the measures of central tendency seen in statistics. This quiz is specifically designed to incorporate the measures of central tendency as they relate to psychological research.

Measures of Central Tendency

Tests knowledge of the three main measures of central tendency, including some simple calculation questions.

Know Your Statistical Concepts

Each question is a choice-summary multiple choice question that presents you with a statistical concept and then 4 numbered statements. You must decide which (if any) of the numbered statements is/are true as they relate to the statistical concept.