See attached data file.
Assume that you are working on a team that has been commissioned by a large school district to collect and analyze data related to a recent curriculum experiment designed to improve student scores on state-wide standardized tests. The schools in this district are predominantly large, urban schools. They are interested in knowing how successful the experiment was and if the new curriculum should be incorporated district-wide. Three years ago, the school district rolled out the experimental curriculum to 100 of the 400 elementary schools in the district. Those 100 schools were selected via a simple random sample. Your working budget is not large enough to collect data on the population of 400 schools and you can afford to only collect a sample of 80 schools. Unless the question states otherwise, conduct all analyses at the 95% confidence level (α=.05).
1. You think that the best sampling strategy is stratified sampling. You'd like to list the characteristics of schools in this district, and then randomly select 80 schools who roughly match the demographic characteristics of the entire population of schools in this district. 40 of these would come from schools that had the experimental curriculum and 40 would come from schools that kept the old curriculum.
a) First of all, how should sample size have been determined? What information was needed to determine sample size?
b) Unfortunately, people from the school board have made the following statements regarding sampling. One well meaning school board member has argued that "If it ain't broken, there's no reason to fix it! We should sample 80 random schools with the old curriculum; as long as the students in those schools are performing acceptably, there is no reason to change curricula." A second school board member has argued that "since you are looking for improvement, what you should do is simply choose the 80 schools that had the biggest improvement from three years ago to now and look at how many of those used the new vs. the old curriculum." Finally, the School District Superintendant has told you that "as you know, I'm currently running for reelection, and it is important that this research turn out a certain way. I have a list of schools that I personally handpicked for your sample that I think really represent what is going on here."
Describe what is wrong methodologically with each of the three suggestions you received from the various school board members. You will want to focus on issues such as sampling error, the sorts of biases that will be introduced from such sampling methods, how you might expect to see those biases manifested in the data/data analysis, and how these issues will affect your ability to comment on the district's original question as to whether or not it would be a good idea to roll out the new curriculum to all schools in the district.
2. Having convinced the board members that a stratified sample is the most appropriate, you collect data from the 80 schools. The spreadsheet provided contains the collected data in the tab labeled "Data." The difference between average school test scores three years ago and average school test scores today is recorded as "chgtestscores." Positive values of chgtestscores indicate an increase in test scores at the school as compared to 3 years ago, while a negative number indicates that the school is now performing worse on these tests. In addition to this variable, there are three more variables in this data set. The first is "curriculum" which can have a value of either "old" or "new," where new implies the experimental curriculum. Next is "income ($000)," which represents the average annual income (in thousands of dollars) of the households of students from each school. Finally, there is a variable called "school" which is simply ID number of the elementary school. The first step to this analysis is to generate some descriptive measures.
For each of the following points, create the chart and/or graph that best displays the data:
a) Show the breakdown of your sample by curriculum type (old vs. new/experimental)
b) Show the distribution of the change in test scores across all schools.
c) Show the distribution of income across all schools.
Additionally, you want to generate some tables of summary statistics.
d) Create one table that calculates summary measures of the change in test scores and income variables across all 80 schools.
e) Create a second table that calculates summary measures of the change in test scores and income variables broken out by curriculum type.
Based on the graphs and tables created in parts a-e:
f) What preliminary conclusions can you draw regarding the effectiveness of the experimental curriculum?
3. One of the criticisms levied upon the old curriculum is that it was outdated. It was so outdated, the board members argue, that it was causing standardized test scores to fall. You decide to test this hypothesis.
a) State the null and alternative hypotheses (H0 and H1)
b) Test the hypothesis that the mean change in schools using the old curriculum was less than 0.
c) Calculate the p-value associated with your test statistic from b.
d) Interpret your results
4. Because the school board's primary concern is whether or not the experimental curriculum led to better standardized test scores, your next step is to conduct a simple analysis comparing test scores from schools with the old curriculum with the test scores from schools with the new/experimental curriculum.
a) Conduct an ANOVA to evaluate whether or not there is a significant difference in test scores between schools with the old curriculum and schools with the new curriculum.
b) Summarize and Interpret the results of your test.
5. The board member who originally wanted you to include only low income households in the survey is still concerned about the particular effect of the experimental curriculum on schools in low income neighborhoods. To find the answer to this, you need to run a multiple regression model.
a) Create a dummy variable for the experimental curriculum and an interaction variable that interacts the experimental dummy and the income variable.
b) Estimate a multiple regression model that includes the curriculum dummy, income, and interaction variable as independent variables.
c) Calculate predicted values for the chgtestscores variable for both the new and old curriculum for income levels of $15,000, $30,000, $60,000, and $120,000.
d) Summarize and interpret the results of this model. What do you tell the board member about the effect of the new curriculum across different income levels?
6. As part of your job as Alumni President, you are interested in finding information about graduates and jobs. You are planning to conduct a study whereby you will be able to estimate the average annual salary of all graduates who continue to work for the same company they originally joined upon graduation. A telephone survey is planned. In order to conduct the sample design, you conduct a quick convenience survey of only 30 graduates that you know and find that 4 out of 10 are still with the first company they joined after graduation, their average salary is $65,000, and 29 out of the 30 respondents (about 95%) indicated their salary was between $55,000 and $75,000.
a) What size random sample should you take if you want to estimate the average annual salary in the population at 90 percent precision?
b) A fellow researcher tells you that the response rate on your telephone survey is likely to be 90%. Thus, how many alumni should you pick from the sample frame and then proceed to call.
7. The director for training at a company that manufactures electronic equipment is interested in determining whether different training methods have an effect on the productivity of assembly line employees. One of his employees who is tasked with testing this assumption randomly assigns 42 recently hired employees into two groups of 21, of which the first receives a computer assisted, individual-based training program and the other receive a team-based training program. Upon completion of the training, the employees are evaluated on the time (in seconds) it took to assemble a part. The results are as follows:
21.3 20.7 21.8 24.4 18.7 19.3
14.1 16.1 16.8 15.6 18.0 21.7
14.7 16.5 16.2 30.7 23.7 12.3
16.4 18.5 16.7 16.0 13.8 18.0
19.3 16.8 17.7 20.8 17.1 28.2
19.8 19.3 16.0 20.8 24.7 17.4
17.7 17.4 16.8 20.1 15.2 23.2
The director is an avid proponent of computer-assisted training and asserts even before the research study that the team-based training program will produce not only different, but higher times to assemble than the computer-assisted training program.
a) What would the researcher conclude with these data? (Assume a maximum type 1 error of 5% and state any assumptions you have to make).
b) What is the probability of a type 1 error in this situation? What does it mean?
c) Write a few sentences describing the findings, which will be presented to the director. What should you advise him or her?
8. Data in the file CPI-U reflect the annual values of the Consumer Price Index (CPI) in the U.S. over the period 1965 through 2005. This index measures the average change in prices over time in a fixed "marker basket" of goods and services purchased by all urban consumers --urban wage earners (i.e. clerical, professional, managerial and technical workers, self-employed individuals, and short term workers), unemployed individuals, and retirees.
a) Show a plot of the data.
b) Describe the movement in this time series over the period.
c) What is the "prediction" model and use it to predict CPI in 2011.
Step by step method for testing the hypothesis under 5 step approach is discussed here. Excel template for each problem is also included. This template can be used to obtain the answers of similar problems.