# Hypothesis testing for student variables at Harrigan University

See the attached data file.

Harrigan University is a liberal arts university in the Midwest that attempts to attract the highest quality students, especially from its region of the country. It has gathered data on 178 applicants who were accepted by Harrigan. The data are in the file named Harrigan which is posted on my website.

The variables are:

Accepted: whether the applicant accepts Harrigan's offer to enroll.

MainRival: whether the applicant enrolls at Harrigan's main rival university.

HSClubs: number of high school clubs applicant served as an officer.

HSSports: number of varsity letters applicant has earned.

HSGPA: applicant's high school GPA.

HSPctile: applicant's percentile (in terms of GPA) is his or her graduating class.

HSSize: number of students in applicant's graduating class

SAT: applicant's combined SAT score

CombinedScore: a combined score for the applicant used by Harrigan to rank applicants.

The derivation of the combined score is a closely kept secret by Harrigan, but it is basically a weighted average of the various components of high school performance and SAT. Harrigan is concerned that it is not getting enough of the best students, and worse yet, it is concerned that many of the best students are going to Harrigan's main rival. Solve the following problems and then write an executive summary on whether Harrigan appears to have a legitimate claim.

1. Find the 95% confidence interval for the proportion of all applicants who accept Harrigan's invitation to enroll. Do the same for all applicants with a combined score less than or equal to the median of combined score. And then for the applicants with a combined score greater than the combined score median. Perform a hypothesis test to determine if there is a significant difference between these two proportions?

2. Find the 95% confidence interval for the proportion of all students with a combined score less than or equal to the median who chose Harrigan's rival over Harrigan. Do the same for those with a combined score greater than the median. Perform a hypothesis test to determine if there is a significant difference between the two?

3. Find the 95% confidence intervals for the mean combined score, the mean high school GPA, and the mean SAT score of all acceptable students who accept Harrigans invitation to enroll. Do the same for all acceptable students who choose to enroll else where. Then find the 95% confidence intervals for the differences between these means, where each difference is a mean for students enrolling at Harrigan minus the similar mean for students enrolling elsewhere.

4. Harrigan is interested in recruiting students who are involved in extracurricular activities. Does it appear to be doing so? Perform a hypothesis test to determine if at least half of those students that come to Harrigan have been officers of at least two clubs. Perform a similar test to determine if at least half of the students that come to Harrigan have at least four varsity letters in sports.

5. The combined score Harrigan calculates for each student gives some advantage to students who attended large high schools relative to those who attended small high schools. Is Harrigan correct in this assumption? (Split the data As a result, Harrigan believes it is more successful in attracting students from large high schools than from small high schools. Are they correct?

6. If the GPA, SAT score, the number of clubs where the student serves as an officer, and the number of letters in sports is used to calculate the combined score, is there a difference in any of these parameters when comparing large schools to small schools? Can you draw any possible conclusions from your results that might cause a shift in the combined score for a specific group of students?

* This case was adapted from a case authored by Albright, Winston and Zapp, 'Data Analysis and Decision Making' 2nd ed., 2004

Case Deliverables:

Please submit an Executive Summary of the findings from your data analysis. Also, in the executive summary, provide Harrigan any advice that you can based on the data that you analyzed. Don't base your advice on your feelings, but purely on the suggestions and conclusions drawn from the data.

Underneath your executive summary, you should have a section for each of the six questions above. Cut and paste your MINITAB output to your text document. Clearly describe the confidence intervals and hypothesis test that you perform.

For every confidence interval that you calculate, give your interpretation of the confidence interval. Also, for every hypothesis test that you perform, clearly state the null and alternative hypotheses, and interpret your findings. You can use the p-value approach or the critical value approach. All hypothesis tests should be performed using the 5% level of significance.

My intent is for you to do all of the analysis in MINITAB. You will have to sort the data numerous times based on different variables, so you will need to learn this function. Of course you can do this in excel and copy it over to MINITAB if you wish. Also, you will need to dissect the data (cut and paste) because MINITAB will not look at a partial column.

This is a team assignment, but do not discuss the solutions of this case with anyone other than your team partner. I expect that each team member to contribute equally to the case analysis and write-up, but in the event that one team member does more than 50% of the work, please specify that on your write-up. I will make grade assignments accordingly.

Let me know if you have any questions.

Here are some helpful hints in arranging the data:

1. You will have to divide the data based on certain variables and then perform a 2-sample hypothesis test. The easiest way to do this is to first sort the data according to the variable of interest. In the first problem that variable is combined score. Sorting in Minitab is different than sorting in excel. Here are the commands: Data>Sort>Select which columns you want to sort>then select which variable you want to sort on>then select where you want to place the sorted data>OK. I usually sort the data and put the data back in the same columns of the same worksheet. Once you have done this, you will need to divide the data. Do this by cutting and pasting half of the data to another set of columns on the same spreadsheet. Once you have done this, you can easily do the 2-sample test by comparing the data in one column to the data in another column.

#### Solution Preview

See attached file.

Harrigan University is a liberal arts university in the Midwest that attempts to attract the highest quality students, especially from its region of the country. It has gathered data on 178 applicants who were accepted by Harrigan. The data are in the file named Harrigan which is posted on my website.

The derivation of the combined score is a closely kept secret by Harrigan, but it is basically a weighted average of the various components of high school performance and SAT. Harrigan is concerned that it is not getting enough of the best students, and worse yet, it is concerned that many of the best students are going to Harrigan's main rival. Solve the following problems and then write an executive summary on whether Harrigan appears to have a legitimate claim.

I. Find the 95% confidence interval for the proportion of all applicants who accept Harrigan's invitation to enroll.

x = Yes, applicant accepts Harrigan's offer.

N=Total sample = 178

P = P(x) = P(Yes, applicant accepts Harrigan's offer) = x/N = 103/178 = 0.5787

95% CI = p +/- 1.96*SE.

Output

Variable X N Sample p 95% CI

Accepted 103 178 0.578652 (0.502502, 0.652137)

Hence, we are 95% confident that the true proportion of students accepted into Harrigan lies between 0.50 and 0.65.

II. Do the same for all applicants with a combined score less than or equal to the median of combined score.

x = Number of entries less than or equal to Median score 356 then x = 1

Combine Score Median= 356 (middle number)

P = P(x) = P(1) = x/N = 91/178 = 0.5112

N = Total sample

95% CI = p +/- 1.96*SE

Output

Variable X N Sample p 95% CI

CombinedScore 91 178 0.511236 (0.435345, 0.586747)

So 95% CI = (0.435345, 0.586747)

Hence, we are 95% confident that the true proportion of Harrigan applicants with a combined score less than or equal to the median of combined score lies between 0.43 and 0.59.

III. And then for the applicants with a combined score greater than the combined score median.

x = Number of entries greater than to Median score 356 then x = 1

N = Total sample

Combine Score Median= 356 (middle number)

P(x) = P(1) = x/N = 87/178 = 0.4888

95% CI = p +/- 1.96*SE.

Output

Variable X N Sample p 95% CI

combinedscore 87 178 0.488764 (0.415330, 0.562198)

So 95% CI = (0.415330, 0.562198)

Hence, we are 95% confident that the true proportion of Harrigan applicants with a combined score greater than to the median of combined score lies between 0.42 and 0.56.

IV. Perform a hypothesis test to determine if there is a significant difference between these two proportions?

H0: p1 - p2 = 0

H1: p1- p2 ≠ 0 (There is a significant difference between these two proportions)

p1 = population proportion of all applicants with a combined score great than the median of combined score.

p2 = population proportion of all applicants with a combined score less than or equal to the median of combined score.

Sample X N Sample p

1 87 178 0.488764

2 91 178 0.511236

Difference = p (1) - p (2)

Estimate for difference: -0.0224719

95% CI for difference: (-0.126324, 0.0813797)

Test Statistics Z = -0.42

P-Value = 0.671

Do not reject H0 since p-value = 0.671 > 0.05. Thus, there is no significant difference between the two proportions.

2. Find the 95% confidence interval for the proportion of all students with a combined score less than or equal to the median who chose Harrigan's rival over Harrigan.

Median of those who chose Harrigan rival over Harrigan = 376.5

If number is less than or equal to 376.5 then Event x = 1

P(x) = P(1) = x/N = 136/178 = 0.7641

N = Total sample

95% CI = p +/- 1.96*SE

Output

Variable X N Sample p 95% CI

New Comb 136 178 0.764045 (0.694736, 0.824343)

So 95% CI = (0.694736, 0.824343)

The 95% confident of the true proportion of Harrigan's rival applicants with a combined score less than or equal to the median combined score lies between 0.6942 and 0.82.

a) Do the same for those with a combined score greater than the median.

Median of those who chose Harrigan rival over Harrigan = 376.5

If number is greater than to 376.5 then Event x = 1

P(x) = P(1) = x/N = 42/178 = 0.2360

N = Total sample

95% CI = p +/- 1.96*SE

Output

Variable X N Sample p 95% CI

Combmore376.5 42 178 0.235955 (0.173580, 0.298330)

So 95% CI = (0.173580, 0.298330)

The 95% confident of the true proportion of Harrigan's rival applicants with a combined score greater than the median combined score lies between 0.17 and 0.30.

b) Perform a hypothesis test to determine if there is a significant difference between the two?

Hypothesis

H0: p1 - p2 = 0

H1: p1- p2 ≠ 0

p1 = population proportion of all applicants with a combined score less than or equal to the median of those who chose Harrigan's rival over Harrigan = 376.5

p2 = population proportion of all applicants with a combined score greater than the median ...

#### Solution Summary

This solution goes step by step through each of the questions, including all the steps necessary to complete a hypothesis test. It is provided in an attached Word document.