# Chi-square Test, Correlation & Regression Analysis

Chi-Square Tests and Nonparametric Tests/Simple Linear Regression

1. A sample of 500 shoppers was selected in a large metropolitan area to determine various information concerning consumer behavior. The results are summarized in the following contingency table:

Enjoy Shopping for clothing Gender

Male Female Total

Yes 136 224 360

No 104 36 140

Total 240 260 500

a. Is there evidence of a significant difference between the proportion of males and females who enjoy shopping for clothing at the 0.01 level of significance?

b. Determine the p-value in (a) and interpret its meaning.

c. What are your answers to (a) and (b) if 206 males enjoyed shopping for clothing and 34 did not?

2. Suppose a study is conducted in which 600 patients are randomly assigned, 200 each, to three organizational groupings-insurance companies, pharmacies, and medical researchers. Each patient is asked "would you object to the sharing of your medical records.'' The results are recorded in the following table:

Organizational Grouping

Object to Sharing info Insurance Pharmacies Research

Yes 40 80 90

No 160 120 110

a. Is there evidence of a difference in objection to sharing information among the organizational groupings? (Use a level of significance of 0.05)

b. compute the p-value and interpret its meaning.

c. If appropriate, use the Marascuilo procedure and 0.05 level of significance to determine which groups are different.

3. A study was conducted on preferred types of office communication by different age groups. Suppose the results were based on a survey of 500 respondents in each age group. The results are cross-classified in the following table:

TYPE OF COMMUNICATION PREFERRED

AGE GROUP Group Meetings Individuals E-mails Other Total

Generation Y` 180 260 50 10 500

Gen X 210 190 65 35 500

Boomer 205 195 65 35 500

Mature 200 195 50 55 500

Total 795 840 230 135 2,000

At the 0.05 level of significance, is there evidence of a relationship between age group and type of communication preferred?

4. The total number of customers in the store and the waiting times (in minutes) were recorded. The results are stored in the attachment titled Supermarket.

a. Construct a scatter plot

b. Assuming a linear relationship, use the least-squares method to determine the regression coefficients b0 and b1.

c. Interpret the meaning of the slope, b1, in this problem.

d. Predict the waiting time when there are 20 customers in the store.

5. An agent for a residential real estate company in a large city would like to be able to predict the monthly rental cost for apartments, based on the size of an apartment, as defined by square footage. The agent selects a sample of 25 apartments in a particular residential neighborhood and gathers the data in the attached spreadsheet titled Rent.

a. Construct a scatter plot

b. Use the least-squares method to determine the regression coefficients b0 and b1.

c. Interpret the meaning of b0 and b1 in this problem.

d. Predict the monthly rent for an apartment that has 1,000 square feet.

e. Why would it not be appropriate to use the model to predict the monthly rent for apartments that have 500 square feet?

f. A couple is trying to decide between two apartments, one with 1,000 square feet for a monthly rent of $1,275 and the other with 1,200 square feet for a monthly rent of $1,425. Based on (a) thru (d), which apartment is the better deal?

6. Magazine newsstand sales are stored in attachment titled Circulation. For that data, Regression Sum of Squares (SSR) = 130,301.41 and Total Sum of Squares (SST) = 144,538.64.

a. Determine the coefficient of determination, r2, and interpret its meaning.

b. Determine the standard error of the estimate.

c. How useful do you think this regression model is for predicting audited sales?

7. You are testing the null hypothesis that there is no linear relationship between two variables, X and Y. From your sample of n=10, you determine that r = 0.80.

a. What is the value of the t test statistic t stat?

b. At the 0.05 level of significance, what are the critical values?

c. Based on your answers to (a) and (b), what statistical decision should you make?

8. You are testing the null hypothesis that there is no linear relationship between two variables, X and Y. From your sample of n = 20, you determine that SSR = 60 and SSE = 40.

a. What is the value of F stat?

b. At the 0.05 level of significance, what is the critical value?

c. Based on your answers to (a) and (b), what statistical decision should you make?

d. Compute the correlation coefficient by first computing r2 and assuming b1 is negative.

e. At the 0.05 level of significance, is there a significant correlation between X and Y?

9. In problem (5) the data is stored in the spreadsheet titled Rent. Using the results of that problem,

a. At the 0.05 level of significance, is there evidence of a linear relationship between the size of the apartment and the monthly rent?

b. construct a 95% confidence interval estimate of the population slope, B1.

10. The attached spreadsheet titled CoffeeDrink represent the calories and fat (in grams) of 16-ounce iced coffee drinks at Dunkin' Donuts and Starbucks.

a. Compute and interpret the coefficient of correlation, r.

b. At the 0.05 level of significance, is there a significant linear relationship between calories and fat?

