# Data Analysis: Chi-Square and Regression

(5a) Collect observed and expected data. Perform a chi-square goodness-of-fit test (chi-square test for differences in more than two proportions).

Test plan:

Hypothesis statements:

Population description:

Description of how the random sample was drawn:

Sample size:

Expected and observed values:

Type of test:

Level of significance (alpha):

Rejection rule:

Test statistic:

Summary of results:

(5b) Collect observed data in the form of a contingency table. Perform a chi-square independence test.

Test plan:

Hypothesis statements:

Population description:

Description of how the random sample was drawn:

Sample size:

Expected and observed values:

Type of test:

Level of significance (alpha):

Rejection rule:

Test statistic:

Summary of results:

(6) Collect bivariate data. Perform a linear regression and correlation study, including creating a scatter diagram and a residual plot, determining the regression line, the coefficient of determination, and the correlation coefficient, testing for the significance of the linear relationship, and interpreting your results. Be sure to state your dependent variable and one independent variable and why you think there might be a relationship.

Test plan:

Hypothesis statements:

Population description:

Description of how the random sample was drawn:

Sample size:

Independent and dependent values:

Scatterplot:

Regression analysis:

Correlation:

Testing significance of linear relationship:

Summary of results:

#### Solution Preview

Deliverable 5: Chi-Squared Tests

Deliverable 5a: Collect observed and expected data. Perform a chi-square goodness-of-fit test (chi-square test for differences in more than two proportions). In addition to the other requirements listed above include a table with your observed and expected data in your narrative. Most of the other data can be moved into the appendix.

Test Plan: We would like to see if the value of exports in the year 2003 varies with type of industry. Previously, we randomly selected 20 different industries (see Deliverable 3). These different industries fall into 10 categories:

food, animals, timber and paper, metal, stone, clothing and textiles, plastic, electrical supplies, dental and medical, and miscellaneous

Hypothesis statements

Narrative hypothesis

Null hypothesis (H0): The value of exports in the year 2003 does not vary by type of industry.

Alternative hypothesis (Ha): The value of exports in the year 2003 does vary by type of industry; i.e. the value of exports is not equally distributed across the various categories.

Population Description: The population used here is total United States exports in 2000 and 2003 separated by industry (from http://ita.doc.gov/td/industry/otea/usfth/aggregate/ H03T41.html). The variable analyzed here is the value (in millions of dollars) of exports in the different industries.

The data of GDP has always been of interest of me because my family was in the Department of Agriculture when I was a kid. I especially find exports interesting, because exports have been decreasing in our country. We import much more than we export, and when I took economics, I really saw how much we bring in our country and the growing problem we face with our deficit in the United States. When I graduate from UMUC I would like to pursue a job in the field or exporting or importing, either in the government sector or in the civilian sector of the business world.

Description of how the random sample was drawn: There are 453 industries for which we have data. I took a random sample in order to get sample sizes of 20. The sample was selected using a Bernoulli random variable. The Bernoulli random variable will give 1 or 0 based on the probability given. Here the probability used is 20/453. The items with Bernoulli variable = 1 is selected. They are given in a separate sheet.

Sample size: total sample size is n = 20; these are divided into 10 categories

Expected and Observed values: The expected values for each category were calculated by dividing the total export revenues in 2003 by 20 (to get the average value), then multiplying that by the number of products in a category (so, for example, we'd multiply the average value by 5 for the food category). The Observed and Expected values are in millions of dollars.

Row # Category Observed Expected Expected %

1 food (n = 5) 1486 4686.75 25.000%

2 animals (n = 1) 536 937.35 5.000%

3 timber and paper (n = 3) 4169 2812.05 15.000%

4 metal (n = 1) 288 937.35 5.000%

5 stone (n = 1) 72 937.35 5.000%

6 clothing and textiles (n = 2) 1650 1874.70 10.000%

7 plastic (n = 2) 3274 1874.70 10.000%

8 electrical supplies (n = 3) 5108 2812.05 15.000%

9 dental and medical (n = 1) 768 937.35 5.000%

10 miscellaneous (n = 1) 1396 937.35 5.000%

Type test: We will use a chi-squared test.

Level of ...

#### Solution Summary

The solution gives step-by-step explanations of how to do a chi-square goodness of fit test, a chi-square test of independence, and a simple linear regression analysis. A Word document and an Excel spreadsheet are attached.