# ANOVA and Regression Analysis Project

See attachment for missing data tables.

Assume that you are working on a team that has been commissioned by a large school district to collect and analyze data related to a recent curriculum experiment designed to improve student scores on state-wide standardized tests. The schools in this district are predominantly large, urban schools. They are interested in knowing how successful the experiment was and if the new curriculum should be incorporated district-wide. Three years ago, the school district rolled out the experimental curriculum to 100 of the 400 elementary schools in the district. Those 100 schools were selected via a simple random sample. Your working budget is not large enough to collect data on the population of 400 schools and you can afford to only collect a sample of 80 schools. Unless the question states otherwise, conduct all analyses at the 95% confidence level (alpha=.05).

2. Having convinced the board members that a stratified sample is the most appropriate, you collect data from the 80 schools. The spreadsheed provided contains the collected data in the tab labelled ""Data."" The difference between average school test scores three years ago and average school test scores today is recorded as ""chgtestscores."" Positive values of chgtestscores indicate an increase in test scores at the school as compared to 3 years ago, while a negative number indicates that the school is now performing worse on these tests. In addition to this variable, there are three more variables in this data set. The first is ""curriculum"" which can have a value of either ""old"" or ""new,"" where new implies the experimental curriculum. Next is ""income($000),"" which represents the average annual income (in thousands of dollars) of the households of students from each school. Finally, there is a variable called ""school"" which is simply ID number of the elementary school. The first step to this analysis is to generate some descriptive measures.

For each of the following points, create the chart and/or graph that best displays the data:

a) Show the breakdown of your sample by curriculum type (old vs. new/experimental)

b) Show the distribution of the change in test scores across all schools.

c) Show the distribution of income across all schools.

Additionally, you want to generate some tables of summary statistics.

d) Create one table that calculates summary measures of the change in test scores and income variables across all 80 schools.

e) Create a second table that calculates summary measures of the change in test scores and income variables broken out by curriculum type.

Based on the graphs and tables created in parts a-e:

f) What preliminary conclusions can you draw regarding the effectiveness of the experimental curriculum?

3. One of the criticisms levied upon the the old curriculum is that it was outdated. It was so outdated, the board members argue, that it was causing standardized test scores to fall. You decide to test this hypothesis.

a) State the null and alternative hypotheses (H0 and H1)

b)Test the hypothesis that the mean change in schools using the old curriculum was less than 0.

c) Calculate the p-value associated with your test statistic from b.

d) Interpret your results.

4. Because the school board's primary concern is whether or not the experimental curriculum led to better standardized test scores, your next step is to conduct a simple analysis comparing test scores from schools with the old curriculum with the test scores from schools with the new/experimental curriculum.

a) Conduct an ANOVA to evaluate whether or not there is a significant difference in test scores between schools with the old curriculum and schools with the new curriculum.

b) Summarize and Interpret the results of your test.

5. The board member who originally wanted you to include only low income households in the survey is still concerned about the particular effect of the experimental curruculum on schools in low income neighborhoods. To find the answer to this, you need to run a multiple regression model.

a) Create a dummy variable for the experimental curriculum and an interaction variable that interacts the experimental dummy and the income variable.

b) Estimate a multiple regression model that includes the curriculum dummy, income, and interaction variable as independent variables.

c) Calculate predicted values for the chgtestscores variable for both the new and old curriculum for income levels of $15,000, $30,000, $60,000, and $120,000.

d) Summarize and interpret the results of this model. What do you tell the board member about the effect of the new curriculum across different income levels?

6. Shortly after you publish your findings in a report, you receive a call from MENSA, who want to open a series of schools for the gifted across the country; children with IQs above 160 would be offered places at these schools completely free of charge. They are intrigued by your findings, and want to know, based on your findings, whether or not they might expect good results if they used the experimental curriculum at their new chain of schools. What do you tell them and why?

7. The variable the school board is most interested in understanding/explaining is the change in school wide standardized test scores. You were also given variables that indicated the curriculum type and the income of the households of students from each school. If this were a real research project, what other data would you collect to use as control variables? Why? In other words, what other variables might you collect that you might plausibly expect to have an impact on the change in test scores aside from income and the curriculum type? Give at least 4 examples and in each case explain why you would include them.

#### Solution Summary

Step by step method for regression analysis and ANOVA are discussed here. Regression coefficients, coefficient of determination, scatter diagram and significance of regression model are explained in the solution, attached as Word and Excel.