Q1 The bad debt ratio for a financial institution is defined to be the dollar value of loans defaulted divided by the total dollar value of all loans made. Suppose a random sample of seven Ohio banks is selected and that the bad debt ratios (written as percentages) for these banks are 7 percent, 4 percent, 6 percent, 7 percent, 5 percent, 4 percent, and 9 percent. Assuming the bad debt ratios are approximately normally distributed, the MINITAB output of a 95 percent confidence interval for the mean bad debt ratio of all Ohio banks is as follows:
Variable N Mean StDev SE Mean 95.0% CI
d-ratio 7 6.000 1.826 0.690 ( 4.311, 7.689)
a Using the sample mean and standard deviation on the MINITAB output, verify the calculation of the 95 percent confidence interval.
b. Calculate a 99 percent confidence interval for the mean debt-to-equity ratio.
c Banking officials claim the mean bad debt ratio for all banks in the Midwest region is 3.5 percent and that the mean bad debt ratio for Ohio banks is higher. Using the 95 percent confidence interval, can we be 95 percent confident that this claim is true? Using the 99 percent confidence interval, can we be 99 percent confident that this claim is true? Explain.
Q2 A production supervisor at a major chemical company wishes to determine whether a new catalyst, catalyst XA-100, increases the mean hourly yield of a chemical process beyond the current mean hourly yield, which is known to be roughly equal to, but no more than, 750 pounds per hour. To test the new catalyst, five trial runs using catalyst XA-100 are made. The resulting yields for the trial runs (in pounds per hour) are 801, 814, 784, 836, and 820. Assuming that all factors affecting yields of the process have been held as constant as possible during the test runs, it is reasonable to
regard the five yields obtained using the new catalyst as a random sample from the population of all possible yields that would be obtained by using the new catalyst. Furthermore, we will assume that this population is approximately normally distributed.
a Using the Excel descriptive statistics output given below, find a 95 percent confidence interval for the mean of all possible yields obtained using catalyst XA-100.
b Based on the confidence interval, can we be 95 percent confident that the mean yield using catalyst XA-100 exceeds 750 pounds per hour? Explain.
Standard Error 8.786353
Standard Deviation 19.64688
Sample Variance 386
Confidence Level(95.0%) 24.39488
Part X: For each of the following situations, indicate whether an error has occurred and, if so, indicate what kind of error (Type I or Type II) has occurred.
a We do not reject H0 and H0 is true.
b We reject H0 and H0 is true.
c We do not reject H0 and H0 is false.
d We reject H0 and H0 is false.
Part Y: What is the level of significance alpha? Specifically, state what you understand by an alpha value of 0.05 and how it is related to Type 1 error?
Q4 Consolidated Power, a large electric power utility, has just built a modern nuclear power plant. This plant discharges waste water that is allowed to flow into the Atlantic Ocean. The Environmental Protection Agency (EPA) has ordered that the waste water may not be excessively warm so that thermal pollution of the marine environment near the plant can be avoided. Because of this order, the waste water is allowed to cool in specially constructed ponds and is then released into the ocean. This cooling system works properly if the mean temperature of waste water discharged is 60°F or cooler. Consolidated Power is required to monitor the temperature of the waste water. A sample of 100 temperature readings will be obtained each day, and if the sample results cast a substantial amount of doubt on the hypothesis that the cooling system is working properly (the mean temperature of waste water discharged is 60°F or cooler), then the plant must be shut down and appropriate actions must be taken to correct the problem.
a Consolidated Power wishes to set up a hypothesis test so that the power plant will be shut down when the null hypothesis is rejected. Set up the null and alternative hypotheses that should be used.
b In the context of this situation, interpret making a Type I error; interpret making a Type II error.
c Suppose Consolidated Power decides to use a level of significance alpha = 0.05, and suppose a random sample of 100 temperature readings is obtained. For each of the following sample results, determine whether the power plant should be shut down and the cooling system repaired:
1. Sample Mean = 60.482 and Sample Standard Deviation = 2
2. Sample Mean = 60.262 and Sample Standard Deviation = 2
3. Sample Mean = 60.618 and Sample Standard Deviation = 2
You should show the 5 step STOH for each sample result.
Q5. Advertising research indicates that when a television program is involving (such as the 2002 Super Bowl between the St. Louis Rams and New England Patriots, which was very exciting), individuals exposed to commercials tend to have difficulty recalling the names of the products advertised. Therefore, in order for companies to make the best use of their advertising dollars, it is important to show their most original and memorable commercials during involving programs. In an article in the Journal of Advertising Research, Soldow and Principe (1981) studied the effect of program content on the response to commercials. Program content, the factor studied, has three levels-more involving programs, less involving programs, and no program (that is, commercials only)-which are the treatments. To compare these treatments, Soldow and Principe employed a completely randomized experimental design. For each program content level, 29 subjects were randomly selected and exposed to commercials in that program content
level as follows:
(1) 29 randomly selected subjects were exposed to commercials shown in more involving programs,
(2) 29 randomly selected subjects were exposed to commercials shown in less involving programs, and,
(3) 29 randomly selected subjects watched commercials only (note: this is called the control group).
Then a brand recall score (measured on a continuous scale) was obtained for each subject. The 29 brand recall scores for each program content level are assumed to be a sample randomly selected from the population of all brand recall scores for that program content level. The mean brand recall scores for these three groups were as follows:
Furthermore, a one-way ANOVA of the data shows that SST = 21.40 and SSE = 85.56.
a. Identify the value of n, the total number of observations, and k, the number of treatments.
b. Calculate MST using MST = SST/(k-1)
c. Calculate MSE using MSE = SSE/(n-k+1)
d. Calculate F = MST/MSE.
e. Define the null and alternate hypotheses using the treatment means M1, M2, and M3 to represent each group. Then test for statistically significant differences between these treatment means. Set alpha =.05. Use the F-table to obtain the critical value of F. You should show the 5 steps in the STOH.
f. If you found a difference due to the treatments, between which groups do you think this treatment is most likely? Note you do have to perform tests to provide this answer.
Q6 An accountant wishes to predict direct labor cost (y) on the basis of the batch size (x) of a product produced in a job shop. Using labor cost and batch size data for 12 production runs, the following Excel Output of a Simple Linear Regression Analysis of the Direct Labor Cost Data was obtained. The scatter plot of this data is also shown.
Multiple R 0.99963578
R Square 0.999271693
Adjusted R Square 0.999198862
Standard Error 8.641541
df SS MS F Significance F
Regression 1 1024593f 1024593 13720.47k 5.04436E-17m
Residual 10 746.7624g 74.67624
Total 11 1025340h
Coefficients Standard Error t Stat P-value
Intercept 18 a 4.67658 3.953211c 0.00271e
BatchSize(X) 10b 0.08662 117.13d 5.04436E-17e
For your aid, the different values in the ANOVA table are explained below using the superscript notation:
a: b0, b: b1, c: t for testing H0: b0 = 0, d: t for testing H0: b1 = 0,
e: p-values for t statistics, f: Explained variation, g: SSE = Unexplained variation,
h: Total variation, k: F(model) statistic, m: p-value for F(model)
Answer the following questions based on the information provided above:
a. Write the regression equation for the LaborCost (y) and BatchSize (x). Note that your equation has to identify the point estimates for b0 and b1 in the equation:
y = b0 + b1x
b Identify the t statistic and the p-value for this t statistic for testing the significance of the slope of the regression line. Using this, determine whether the null hypothesis
H0: b1 = 0 can be rejected?
c What do you conclude about the relationship between LaborCost (y) and BatchSize (x)? Use the different test statistics provided in the data to support your case.
d. Interpret the meanings of b0 and b1. Does the interpretation of b0 make practical sense for this case? Think carefully about what the value of x will be when y = b0 .
e Estimate the value of LaborCost for a batch size of 10. Use your regression equation and show all your steps.
Q7 Use the following data for the given situation: International Machinery, Inc., produces a tractor and wishes to use quarterly tractor sales data observed in the last four years to predict quarterly tractor sales next year.
All the data for answering the problems (a) through (c) has been provided to you. You do not have to compute any data for parts (a) through (c).
a. What type of seasonal variation do you see in the sales data? Is there no seasonal variation, constant seasonal variation, increasing seasonal variation, or decreasing seasonal variation? State your reasons. Find and identify the four seasonal factors for quarters 1, 2, 3, and 4.
b. What type of trend is indicated by the plot of the deseasonalized data?
c. What is the equation of the estimated trend that has been calculated using the deseasonalized data?
d. Compute a point forecast of tractor sales (based on trend and seasonal factors) for each of the quarters next year. You should show all your steps for each quarter forecast. (Hint: Note that you will use the equation from ( c ). This will provide you with the deseasonalized data. You then have to adjust it for the seasonal factor applicable for the quarter.)
Problems on Confidence Intervals, Statistical Test of Hypothesis, ANOVA, Regression and Forecasting have been answered
Regression (20 Problems) : Multiple Regression Model Building, Averages and Exponential Smoothing, Hypothesis Testing and ANOVA
1. A real estate builder wishes to determine how house size (House) is
influenced by family income (Income), family size (Size), and
education of the head of household (School). House size is measured
in hundreds of square feet, income is measured in thousands of
dollars, and education is measured in years. The builder randomly
selected 50 families and ran the multiple regression. The business
literature involving human capital shows that education influences
an individual's annual income. Combined, these may influence
family size. With this in mind, what should the real estate builder
be particularly concerned with when analyzing the multiple
a. Randomness of error terms
c. Normality of residuals
d. Missing observations
2. A microeconomist wants to determine how corporate sales are
influenced by capital and wage spending by companies. She
proceeds to randomly select 26 large corporations and record
information in millions of dollars. A statistical analyst discovers
that capital spending by corporations has a significant inverse
relationship with wage spending. What should the microeconomist
who developed this multiple regression model be particularly
a. Randomness of error terms
c. Normality of residuals
d. Missing observations
3. The Variance Inflationary Factor (VIF) measures the
a. correlation of the X variables with the Y variable.
b. contribution of each X variable with the Y variable after all
other X variables are included in the model.
c. correlation of the X variables with each other.
d. standard deviation of the slope.
4. In multiple regression, the __________ procedure permits variables
to enter and leave the model at different stages of its development.
a. forward selection
b. residual analysis
c. backward elimination
d. stepwise regression
5. Which of the following is not used to find a "best" model?
a. adjusted r2
b. Mallow's Cp
c. odds ratio
d. all of the above
6. The logarithm transformation can be used
a. to overcome violations of the autocorrelation assumption.
b. to test for possible violations of the autocorrelation
c. to change a linear independent variable into a nonlinear
d. to change a nonlinear model into a linear model.
7. The Cp statistic is used
a. to determine if there is a problem of collinearity.
b. if the variances of the error terms are all the same in a
c. to choose the best model.
d. to determine if there is an irregular component in a time
8. Which of the following is used to determine observations that have
an influential effect on the fitted model?
a. Cook's distance statistic
b. Durbin-Watson statistic
c. variance inflationary factor
d. the Cp statistic
9. An auditor for a county government would like to develop a model to
predict the county taxes based on the age of single-family houses. A
random sample of 19 single-family houses has been selected, with
the results as shown below (and also in the data file TAXES on your
Taxes Age of House
Assuming a quadratic relationship between the age of the house and
the county taxes, which of the following is the best prediction of the
average county taxes for a 20-year old house?
10. An econometrician is interested in evaluating the relation of
demand for building materials to mortgage rates in Los Angeles and
San Francisco. He believes that the appropriate model is
Y = 10 + 5X1 + 8X2
Where X1 = mortgage rate in %
X2 = 1 if San Francisco, 0 if LA
Y = demand in $100 per capita
Referring to the information above, holding constant the effect of
city, each additional increase of 1% in the mortgage rate would lead
to an estimated increase of ________ in the mean demand.
11. Referring to the information in #10 above, the fitted model for
predicting demand in Los Angeles is ________.
a. 10 + 5X1
b. 10 + 13X1
c. 15 + 8X2
d. 18 + 5X2
12. Table 3.1
In Hawaii, condemnation proceedings are underway to enable
private citizens to own the property that their homes are built on.
Until recently, only estates were permitted to own land, and
homeowners leased the land from the estate. In order to comply
with the new law, a large Hawaiian estate wants to use regression
analysis to estimate the fair market value of the land. Each of the
following 3 models were fit to data collected for n = 20 properties, 10
of which are located near a cove.
Model 1: Y = β0 + β1 X1 + β2 X2 + β3 X1X2 + β4 X12 + β5 X12X2 + ε
where Y = Sale price of property in thousands of dollars
X1 = Size of property in thousands of square feet
X2 = 1 if property located near cove, 0 if not using the data
collected for the 20 properties, the following partial output
obtained from Microsoft Excel is shown:
Multiple R 0.985
R Square 0.970
Standard Error 9.5
Df SS MS F Signif F
Regression 5 28324 5664 62.2 0.0001
Residual 14 1279 91
Total 19 29063
Coeff StdError t Stat p-value
Intercept -32.1 35.7 -0.90 0.3834
Size 12.2 5.9 2.05 0.0594
Cove -104.3 53.5 -1.95 0.0715
Size*Cove 17.0 8.5 1.99 0.0661
SizeSq -0.3 0.2 -1.28 0.2204
SizeSq*Cove -0.3 0.3 -1.13 0.2749
Referring to Table 3.1, given a quadratic relationship between sale
price (Y) and property size (X1), what null hypothesis would you test
to determine whether the curves differ from cove and non-cove
a. H0 : β2 = β3 = β5 = 0
b. H0 : β3 = β5 = 0
c. H0 : β4 = β5 = 0
d. H0 : β2 = 0
13. Referring to Table 3.1, is the overall model statistically adequate
at a 0.05 level of significance for predicting sale price (Y)?
a. No, since some of the t-tests for the individual variables are
b. No, since the standard deviation of the model is fairly large.
c. Yes, since none of the β-estimates are equal to 0.
d. Yes, since the p-value for the test is smaller than 0.05.
14. The method of moving averages is used
a. to plot a series.
b. to exponentiate a series.
c. to smooth a series.
d. in regression analysis.
15. When using the exponentially weighted moving average for
purposes of forecasting rather than smoothing,
a. the previous smoothed value becomes the forecast.
b. the current smoothed value becomes the forecast.
c. the next smoothed value becomes the forecast.
d. None of the above.
16. In selecting an appropriate forecasting model, the following
approaches are suggested:
a. Perform a residual analysis.
b. Measure the size of the forecasting error.
c. Use the principle of parsimony.
d. All of the above.
17. To assess the adequacy of a forecasting model, one measure that is
often used is
a. quadratic trend analysis.
b. the MAD.
c. exponential smoothing.
d. moving averages.
18. A model that can be used to make predictions about long-term
future values of a time series is
a. linear trend.
b. quadratic trend.
c. exponential trend.
d. All of the above.
19. You need to decide whether you should invest in a particular stock.
You would like to invest if the price is likely to rise in the long run.
You have data on the daily average price of this stock over the past
12 months. Your best action is to
a. compute moving averages.
b. perform exponential smoothing.
c. estimate a least square trend model.
d. compute the MAD statistic.
20. Which of the following statements about moving averages is not
a. It can be used to smooth a series.
b. It gives equal weight to all values in the computation.
c. It is simpler than the method of exponential smoothing.
d. It gives greater weight to more recent data.
21. The following table contains the number of complaints received in a
department store for the first 6 months of last year.
21. Table 3.2
Referring to the Table 3.2 above, if a three-term moving average is
used to smooth this series, what would be the second calculated