# ANOVA and Regression Analysis

1. ANOVA:

Please review EXCEL DATA tab and DATA NOTES. Three columns from that dataset are of interest to us now: total admissions, region and control. Admissions are the number of patients admitted in a year, region is the geographical location of the hospital (see tabulation below) and control is the type of ownership of the hospital (see tabulation below).

Region:

1 South

2 Northeast

3 Midwest

4 Southwest

5 Rocky Mountain

6 California

7 Northwest

Control:

1 Government, non federal

2 Non-government, not for profit

3 For profit

4 Federal government

We speculate as to whether the number of patients a hospital can admit is related to either the region where the hospital is located, or the type of ownership (control) under which it operates. To help answer this question, we run ANOVA analyses for both, region and control. The results are displayed below.

On the basis your knowledge of statistics and the ANOVA results, answer the questions that follow:

a. What are the assumptions that have to be met for ANOVA?

b. What is the difference between the "source of variation within groups" and the "source of variation between groups"?

c. What are the "F" and "F crit" (critical) values on the tables below? Where do these come from?

d. What is the "p value" and what does it mean?

e. Given the F, F critical and p values in the results for the Region ANOVA, is there enough evidence to support the hypothesis that the number of patients admitted varies from one region to another? Why?

f. Given the F, F critical and p values in the results for the Control ANOVA, is there enough evidence to support the hypothesis that the number of patients admitted varies according to the type of ownership? Why?

g. If you replied "yes" to any of the first two questions, give the region and/or type of control (depending on which analyses were statistically significant) where more patients are admitted. Do you have an idea of why this might be so?

h. What is the value of alpha used for these ANOVA tests?

ANOVA for Region:

SUMMARY

Groups Count Sum Average Variance

1 56 397774 7103.107143 39043165.55

2 30 240387 8012.9 59624886.58

3 60 326839 5447.316667 42528807.41

4 3 21546 7182 58229251

5 20 153981 7699.05 76628788.05

6 19 138379 7283.105263 18622621.21

7 12 87461 7288.416667 28544383.54

Source of Variation SS df MS F P-value F crit

Between Groups 182761794.9 6 30460299.14 0.683006313 0.663559354 2.145801468

Within Groups 8607296329 193 44597390.3

Total 8790058124 199

ANOVA for Control:

SUMMARY

Groups Count Sum Average Variance

1 51 307019 6019.980392 52101766.06

2 86 765543 8901.662791 45022158.44

3 45 225017 5000.377778 34110005.1

4 18 68788 3821.555556 8302285.556

Source of Variation SS df MS F P-value F crit

Between Groups 716107274.3 3 238702424.8 5.794644546 0.000815401 2.650676523

Within Groups 8073950849 196 41193626.78

Total 8790058124 199

2. REGRESSION

Another interesting idea is to test whether some hospitals are inherently more efficient (or better runnthan others). One way to test this is finding a relationship between the total number of patients admitted and the total expenses of the hospital. A positive linear relationship would mean that the more patients admitted, the higher the expenses, and thus there are probably few differences between hospitals when it comes to expense efficiency. Below you can see the results of a linear regression of total expenses (independent variable) on admissions (dependent variable). Using the data in these tables, answer the following questions:

a. What is linear regression and how does it work?

b. What assumptions must be met to use linear regression?

c. What is the coefficient of determination and what does it tell you?

d. Given the results below, is there enough evidence to believe the linear relationship between expenses and admissions is close to a real description of the data? Why?

e. If you answered "yes" in the previous question, what would be the equation of the line that describes this relationship? If you answered "no", what kind of relationship do you think best describes the one between expenses and admissions?

f. How well does the regression equation fit the real data? How can you tell?

g. Given the linear relationship obtained from this analysis (whether you find it to be a good fit of the data or not), estimate the number of admissions a hospital should be able to handle if its expenses are $200,000.

Regression Statistics

Multiple R 0.902489572

R Square 0.814487428

Adjusted R Square 0.813550496

Standard Error 2869.788901

Observations 200

Coefficients Standard Error t Stat P-value

Intercept 1110.431379 280.7739077 3.954895197 0.00010658

Tot. Exp. 0.085216268 0.002890243 29.48411622 2.3232E-74

#### Solution Summary

This solution is comprised of detailed explanation and step-by-step calculation of the given problems. Supplemented with Interactive EXCEL output and more than 300 words of text, this solution provides students with a clear perspective of the underlying concepts.