# Hypothesis Testing, Normality check & Regression Analysis

Please use use the templates attached.

1. (a) Suppose you want to estimate the mean percentage of gain in per share value for growth-type mutual funds over a specific 2-year period. Ten mutual funds are randomly selected from the population of all the commonly listed funds. The percentage gain figures are shown below (negative values indicate losses).

11.2 4.8 -2.6 16.8 -1.9

10.1 9.6 14.9 10.1 11.2

Find a 90% confidence interval for the mean percentage gain for the population of funds. Assume that the population of percentage gains for growth-type mutual funds can be adequately approximated by a normal distribution.

(b) Now go back and use a goodness of fit test to check the assumption made in part (a) that the data can be assumed to come from a normal distribution.

2. (a) Average total daily sales at a small food store are known to be $451.75. The store manager recently made some changes in displays of goods, order within aisles, and other changes, and she now wants to know whether average sales volume has increased. A random sample of 14 days shows Xbar = $502.95 and s = $64.00. You can assume normality. Using alpha = 4%, is the sampling result significant? Explain.

(b) A simple random sample of 350 households from a large community was selected to estimate the mean residential electricity usage per household during January of last year. Another simple random sample of 420 households was selected, independently of the first, to estimate mean residential electricity usage during January of this year. The sample results (expressed in kilowatt hours) were:

Sample Sample Sample

Year size mean sd

Last Year 350 1272 kwh 259 kwh

This Year 420 1335 kwh 265 kwh

You feel that the mean usage of electricity has increased for January of this year

compared to January of last year. Carry out the appropriate hypothesis test at

5%.

3. (a) Suppose the USGA wants to compare the mean distances associated with five different brands of golf balls when struck with a driver. A completely randomized design is employed, with Iron Byron, the USGA's robotic golfer, using a driver to hit a random sample of 10 balls of each brand in a random sequence. The distance is recorded for each hit, and the results are shown below, organized by brand.

Golf Ball Test

Brand A Brand B Brand C Brand D Brand E

251.2 263.2 269.7 251.6 248.1

245.1 262.9 263.2 248.6 247.5

248.0 265.0 277.5 249.4 242.2

251.1 254.5 267.4 242.0 245.1

260.5 264.3 270.5 246.5 247.6

250.0 257.0 265.5 251.3 260.1

253.9 262.8 270.7 261.8 247.7

244.6 264.4 272.9 249.0 245.0

254.6 260.6 275.6 247.1 241.6

248.8 255.9 266.5 245.9 249.4

Carry out a complete analysis using ANOVA and a 5% significance level.

(b) The Bradford Electric Illuminating Company is studying the relationship between kilowatt hours (thousands) used and the number of rooms in a private single-family residence. A random sample of 10 homes yielded the following data.

Number of Rooms Kilowatt-Hours (thousands)

12 9

9 7

14 10

6 5

10 8

8 6

10 8

10 10

5 4

7 7

Carry out a complete analysis using Simple Linear Regression and a 5% significance level. In addition use your results to estimate the number of kilowatt-hours, in thousands, for a six-room house, and provide a 95% C.I. for your estimate.

4. The Article "Characterization of Highway Runoff in Austin, Texas, Area" (J. of Envir. Engr., 1998: 131-137) gave a scatter plot, along with the simple regression line, of X = rainfall volume (m3) and Y = runoff volume (m3) for a particular location. The following is data from the article.

X: 5 12 14 17 23 30 40 47 55 67 72 81 96 112 127

Y: 4 10 13 15 15 25 27 46 38 46 53 70 82 99 100

Carry out a "complete" analysis using Simple Linear Regression and a 5% significance level. In addition use your results to estimate the amount of runoff volume for a rainfall volume of 65 m3, and provide a 95% C.I. for your estimate. Remember in your analysis to discuss the pertinent results obtained.

5. An experiment was designed by a market researcher to study the effects of two types of promotional expenditures on sales of a line of food products sold in supermarkets. Sixteen locations were selected at random for the test. The researcher collected data on three factors. The data is given below, where y = sales volume ($ ten thousands), x1 = media expenditures ($ thousands), and x2 = point-of-sale expenditures ($ thousands).

y: 8.74 10.53 10.99 11.97 12.74 12.83 14.69 15.30 16.11

x1: 2 2 2 2 3 3 3 3 4

x2: 2 3 4 5 2 3 4 5 2

y: 16.31 16.46 17.69 19.65 18.86 19.93 20.51

x1: 4 4 4 5 5 5 5

x2: 3 4 5 2 3 4 5

The following model is proposed:

Y = b0 + b1x1 + b2x2 + e

Carry out a "complete" multiple regression analysis of this model. In addition

predict y for x1 = 3, and x2 = 3, and give a 95% C.I. for both Y|X, and E[Y|X].

6. (a)The article "Nonbloated Burned Clay Aggregate Concrete" (J. Materials, 1972: 555-563) reports the following data on 7-day flexural strength of nonbloated burned clay aggregate concrete samples (psi):

227 299 291 276 316 318 323 338 361 374 373 385 401 403 432 427 442 411 456 448 466 472 470 492 496 508 517 548 570 726 746 766

786 806 816 831

Test at a significance level of 5% to decide if flexural strength can be considered

to be normally distributed. Use the appropriate Goodness of Fit test.

(b) The article "Measuring the Exposure of Infants to Tobacco Smoke" (N. Engl.

J. Med., 1964: 1075-1078) reports on a study in which various measurements

were taken both from a random sample of infants who had been exposed to

household smoke and from a sample of unexposed infants. The accompanying

data consists of observations on urinary concentration of cotanine, a major

metabolite of nicotine. Does the data suggest that true average cotanine level

is higher in exposed infants than in unexposed infants by more than 25?

Cary out a test at a significance level of 5%.

Unexposed : 8 11 12 14 20 43 111

Exposed: 35 56 83 92 128 150 176 208

You can't assume normality, so use the appropriate nonparametric test.

© BrainMass Inc. brainmass.com October 25, 2018, 2:02 am ad1c9bdddfhttps://brainmass.com/statistics/regression-analysis/hypothesis-testing-normality-check-regression-analysis-284057

#### Solution Summary

The solution provides step by step method for the calculation of test statistics, Anova, Regression equations The solution also provides step method for checking Normality assumption. Formula for the calculation and Interpretations of the results are also included. Interactive excel sheet is included. The user can edit the inputs and obtain the complete results for a new set of data.

Steps of analysis of CIA data using regression and correlation analysis.

Refer to the CIA data, which reports demographic and economic information on 46 countries. Let unemployment be the dependent variable and percent of the population over 65, life expectancy, and literacy be the independent variables.

a. Determine the regression equation using a software package. Write out the regression equation.

b. What is the value of the coefficient of determination?

c. Check the independent variables for multicollinearity.

d. Conduct a global test on the set of independent variables.

e. Test each of the independent variables to determine if they differ from zero.

f. Would you delete any of the independent variables? If so, rerun the regression analysis and report the new equation.

g. Make a histogram of the residuals from your final regression equation. Is it reasonable to conclude that the residuals follow a normal distribution?

h. Plot the residuals versus the fitted values and check. Are there any problems?

See attached file for full problem description.

View Full Posting Details