# Chi square test and regression analysis

1. A researcher selects a random sample of college students and measures the number of hours they spend watching television per week and their grade point average. The results are as follows:

X Y

Subject # TV hours GPA

1 50 1.90

2 20 2.20

3 19 2.40

4 10 3.30

5 9 2.90

6 14 2.50

7 7 3.50

8 40 1.95

9 30 2.00

10 4 3.90

a. Calculate the descriptive statistics for both variables.

b. Construct a scatter plot using TV Hours as the X-axis and GPA as the Y-axis. Include the regression line and the regression equation. What percent of the change in GPA is predicted by the change in the TV hours?

c. What is the predicted GPA if a person watches 15 hours of television per week?

2. A researcher is interested in whether social assertiveness might be influenced by training procedures. Two groups of college sophomores are randomly selected. Group A subjects are assigned to the "training sessions" and group B subjects do not receive the training. Both groups are given a "Social Assertiveness" test (higher scores indicate greater assertiveness). The results are the following:

Group A: 14, 11, 10, 10, 8, 6, 8, 9, 10, 11, 12

Group B: 12, 10, 10, 8, 7, 3, 8, 6, 5, 4, 6, 7, 9

a. Construct a 90% confidence interval for each group.

b. Perform the appropriate statistical test to determine if there is a statistical difference between the two groups at the 0.05 level.

3. A researcher is interested in demonstrating that ingestion of the drug magnesium permoline (MgPe) increases retention of learned material. A group of 16 subjects is randomly selected from WSSU students. The subjects are then randomly assigned to one of four conditions: A receives placebo, B receives 10cc of MgPe, C receives 20 CC of MgPe, and D receives 30 cc of MgPe. All subjects are then given some material to read and four hours later are tested for retention (high scores indicate high retention).

A B C D

8 10 11 10

6 7 6 8

6 8 8 7

5 9 12 11

6 11 14 10

5 7 13 11

7 9 11 10

3 6 9 9

a. Identify the independent and dependent variables.

b. Do the appropriate statistical test.

c. Do you accept or reject the null hypothesis?

d. Show what conditions are equal and what conditions are different.

e. What conclusions can you draw from this study?

4. One possible side effect of air pollution is genetic damage. A study designed to examine this problem exposed one group of mice air near a steel mill and another group to air in a rural area. The study compared the number of Hm-2 gene mutations in each group. The data table is below.

_____________Location____________

Mutation Steel Mill Air Rural Air

_______________________________________________________________

Yes 30 25

No___________________70____________________125________________________

a. State the null hypothesis.

b. Calculate the chi-square statistic.

c. Calculate the p-value from Table F.

d. At the 5% level, is there evidence to conclude that location and mutation are related?

https://brainmass.com/statistics/regression-analysis/chi-square-test-and-regression-analysis-242321

#### Solution Summary

The solution provides step by step method for the calculation of chi square and regression model . Formula for the calculation and Interpretations of the results are also included.

Chi-square Test, Regression Analysis and Correlation

See file attached for proper format of tables.

12.5 A sample of 500 shoppers was selected in a large metropolitan area to determine various information concerning consumer behavior. Among the questions asked was, "Do you enjoy shopping for clothing?" The results are summarized in the following contingency table:

Enjoy Shopping for Clothing Gender

Male Female Total

Yes 136 224 360

No 104 36 140

Total 240 260 500

Is there evidence of a significant difference between the proportion of males and females who enjoy shopping for clothing at the 0.01 level of significance? Determining the p-value in (a) and interpret its meaning. What are your answers to (a) and (b) if 206 males enjoyed shopping for clothing and 34 did not? Compare the results of (a) through (c).

12.15 The health-care industry and consumer advocacy groups are at odds over the sharing of a patient's medical records without the patient's consent. The health-care industry believe that no consent should be necessary to openly share data among doctors, hospitals, pharmacies, and insurance companies. Suppose a study is conducted in which 600 patients are randomly assigned, 200 each, to three "organizational groupings"- insurance companies, pharmacies, and medical researchers. Each patient is given material to read about the advantages concerning the sharing of medical records within the assigned "organizational grouping." Each patient is then asked "would you object to the sharing of your medical records with..."; the results are recorded in the following contingency table:

Object to Sharing Information

Organizational Grouping

Insurance Pharmacies Research

Yes 40 80 90

No 160 120 110

Is there evidence of a difference in objection to sharing information among the organizational groupings? (Use α = 0.05) Compute the p-value and interpret its meaning? If appropriate, use the Marascuilo procedure and α = 0.05 to determine which groups are different.

12.23 USA Today reported on preferred types of office communication by different age groups ("Taking Face to Face vs. Group Meetings," USA Today, October 13, 2003, p. A1). Suppose the results were based on a survey of 500 respondents in each age group. The results are cross-classified in the following table:

Age Group Group Meetings Face-to-Face Meetings with Individuals E-mails Other Total

Generation Y 180 260 50 10 500

Generation X 210 190 65 35 500

Boomer 205 195 65 35 500

Mature 200 195 50 55 500

Total 795 840 230 135 2,000

At the 0.05 level of significance, is there evidence of a relationship between age group and type of communication preferred?

13.7 A critically important aspect of customer service in a supermarket is the waiting time at the checkout (defined as the time the customer enters the line he or she is served). Data were collected during time periods in which a constant number of checkout counters were open. The total number of customers in the store and the waiting times (in minutes) were recorded. The results are stored in Supermarket.

Construct a scatter plot Assuming a linear relationship, use the least-squares method to determine the regression coefficients b0 and b1 Interpret the meaning of the slope, b1, in this problem. Predict the waiting time when there are 20 customers in the store.

13.9 An agent for a residential real estate company in a large city would like to be able to predict the monthly rental cost for apartments, based on the size of an apartment, as defined by square footage. The agent selects a sample of 25 apartments in a particular residential neighborhood and gathers the data below (stored in Rent).

ApartmentMonthly Rent ($)Size (Sq.Feet)ApartmentMonthly Rent ($)Size (Sq. Feet)1

950 850 14 1,800 1,369

2 1,600 1,450 15 1,400 1,175

3 1,200 1,085 16 1,450 1,225

4 1,500 1,232 17 1,100 1,245

5 950 718 18 1,700 1,259

6 1,700 1,485 19 1,200 1,150

7 1,650 1,136 20 1,150 896

8 935 726 21 1,600 1,361

9 875 700 22 1,650 1,040

10 1,150 956 23 1,200 755

11 1,400 1,100 24 800 1,000

12 1,650 1,285 25 1,750 1,200

13 2,300 1,985

Construct a scatter plot Use the least-squares method to determine the regression coefficients b0 and b1 Interpret the meaning of b0 and b1 in this problem Predict the monthly rent for an apartment that has 1,000 square feet. Why would it be not be appropriate to use the model to predict the monthly rent for apartments that have 500 square feet? Your friends Jim and Jennifer are considering signing a lease for an apartment in this residential neighborhood. They are trying to decide between two apartments, one with 1,000 square feet for a monthly rent of $1,275 and the order with 1,200 square feet for a monthly rent of $1,425. Based on (a) through (d), which apartment do you think is the better deal?

13.17 For those data, SSR = 130,301.41 and SST = 144,538.64.

Determine the coefficient of determination, r2, interpret its meaning Determine the standard error of the statue How useful do you think this regression model is for predicting audited sales?

13.39 You are testing the null hypothesis that there is no linear relationship between two variables, X and Y. From your sample of n = 10, you determine that r = 0.80. What is the value of the rest t test statistic tSTAT? At the alpha = 0.05 level significance, what are the critical values? Based on your answers to (a) and (b), what statistical decision should you make?

13.41 You are testing the null hypothesis that there is no linear relationship between two variables, X and Y. From your sample of n =20, you determine that SSR = 60 and SSE = 40. What is the value of FStat? At the alpha = 0.05 level of significance, what is the critical value? Based on your answers to (a) and (b), what statistical decision should you make? Compute the correlation coefficient by first computing r2 and assuming the b1 is negative At the 0.05 level of significance, is there a significant correlation between X and Y?

13.47 In Problem 13.9, an agent for a real estate company wanted to predict the monthly rent for apartments, based on the size of the apartment. The data are stored in Rent. Using the results of that problem, At the 0.05 level of significance, is there evidence of a linear relationship between the size of the apartment and the monthly rent? Construct a 95% confidence interval estimate of the population slope.

13.51 The file CoffeeDrink represent the calories and fat (in grams) of 16-ounce iced coffee drinks at Dunkin' Donuts and Starbucks:

Product Calories Fat

Dunkin' Donuts Iced Mocha Swirl latte (whole milk) 240 8.0

Starbucks Coffee Frappuccino blended coffee 260 3.5

Dunkin' Donuts Coffee Coolatta (cream) 350 22.0

Starbucks Iced Coffee Mocha Espresso (whole milk and whipped cream) 350 20.0

Starbucks Chocolate Brownie Frappuccino blended coffee (whipped cream) 510 22.0

Starbucks Chocolate Frappuccino Blended Creme (whipped cream) 530 19.0

Compute and interpret the coefficient of correlation, r. At the 0.05 level of significance, is there a significant linear relationship between calories and fat?

View Full Posting Details