# Simple Linear Regression, Multiple Regression Model, and CI

1. A simple linear regression model relating investment (y) by companies to bank lending interest rate (x) is stated as

error term the is where y=?o+?1x+? where is ? is the error term

a. What are the intercept and slope for the relationship between investment and interest rate stated above? What sign would you expect for the slope in the relationship for investment and interest rate? Please explain your reasoning.

b. What is the role of the error term in the simple linear regression model like the one stated above? Please list as many examples as you can of the factors that you think belong in the error term of the simple linear regression model stated above briefly justifying each example.

c. Please explain how you would estimate the relationship between investment and interest rate specified above. [note: you do not need to write any formulas as an answer to this question. Just explain what you will need and the reasoning behind the method you would use to estimate the relationship].

d. For what purpose could you use the estimated simple linear regression model for investment and interest rate? Why could the simple linear regression model stated above be inadequate for the purpose?

2. John Cooper, a real estate appraiser in small town in Pennsylvania, has estimated the following simple linear regression model relating home prices (y) in the town with the square footage (x) for the homes.

?=85,473+52.65X

(620) (5.22)

n=36 SST=865,500 SSR=545,328 SSE=320,272

where the numbers in the parenthesis are the standard errors.

SST= total sum of squares SSR=Regression (explained) sum of squares

SSE=Error (residual) sum of squares

a. Does the relationship between home prices and square footage reported above make sense? Please interpret the slope of the simple linear regression model estimated by John.

b. At 1% level of significance, test whether square footage has a significant effect on home price. Please clearly show all your steps and comment on whether your decision to reject or not to reject the null hypothesis makes sense.

c. Calculate and interpret the coefficient of determination (R2) for home price and square footage. Does the magnitude of R2 make sense for the linear regression reported above?

d. Calculate the correlation coefficient (r) and conduct a hypothesis test involving a null hypothesis which says there is no correlation between home price and square footage (r=0) against an alternative hypothesis which says there is correlation between home price and square footage (r?0). Conduct the test at 1% level of significance.

e. Did you arrive at the same conclusion in b and d? Given the results of your hypothesis tests and the magnitude of the R2, do you think is it reasonable to use John's model for predicting home prices in the town?

f. Use the model to predict the selling price of a house with a square footage of 2000.

g. A house with 2000 square feet recently sold for $220,000. Explain why there is a difference between the price predicted by the model for such a house and the actual price at which the house was sold.

h. If you were to estimate a multiple regression model for home prices what other variables might you include in the model? Please briefly justify why you think each of the variables you would include influence home prices.

3. Accountants at Zodok Company believed that several traveling executives submit unusually high travel vouchers when they return from business trips. The accountants took a sample of 35 vouchers submitted from the past year and estimated the following multiple linear regression model relating submitted travel costs (y) to the number of days on road (x1), distance traveled (x2), age of the executive (x3) and gender of the executive (x4).

?=450.3 + 122.4x1 + 0.42x2 - 1.2x3 - 15x4

(150.1) (32.1) (0.06) (0.6) (10)

N=35 SST=122,344 SSR=88,087.68 SSE=34,256

Where

X4=1 for female executive

=0 for male executive

The numbers in the parenthesis are the standard errors.

SST= Total sum of squares SSR=Regression (explained) sum of squares

SSE= Errors (residual) sum of squares

a. If you were a member of the team who estimated this model, what additional independent variables would you include in the model? Which variable(s) would you exclude? Please briefly justify your answers.

b. Please interpret the estimated coefficient for each of the independent variables. Does the sign of the estimated coefficient make sense for each independent variable.

c. At 1% level of significance, test whether each of the four estimated coefficients is statistically significant. Please clearly show all your steps and comment on whether your decision to reject or not to reject the null hypothesis makes sense in each case.

d. At 1% level of significance please test whether all the four independent variables have jointly significant effect on travel costs clearly showing all your steps.

e. Calculate and interpret the coefficient of multiple determination (R2). Does the magnitude of R2 make sense for the multiple linear regression model reported above?

f. Given the magnitude of R2 and your decisions about the significance tests, do you think the model reported above is strong enough to be used for predicting the expected travel costs for executives of Zodok Company?

g. If a 50 year old male executive submitted vouchers in the amount of $1100 for 3 days trip to a city 500 miles away, what would be the difference between the submitted amount and the amount the model would predict for this trip?

4. A cola-dispensing machine is set to dispense a mean of 2.02 liters into a container labeled 2 liters. Actual quantities dispensed vary and the amounts are normally distributed with a standard deviation of 0.015 liters.

a. What is the probability a container will have less than 2 liters?

b. What is the probability that a container will have more than 2.04 liters?

c. What is the probability that a container will have between 2 and 2.04 liters?

5. Management of a refrigerator assembly plant is considering adopting a bonus system to increase production. Past records indicate that, on the average, 4000 units are assembled within a week. The distribution of the weekly production is approximately normally distributed with a standard deviation of 60 units.

a. If the bonus is paid on the upper 5 percent of production, what is the cutoff level of production above which bonus will be paid ?

b. What is the probability that at least 4300 refrigerators may be assembled in a given week?

c. What is the probability that less than 3900 units may be assembled in a given week?

6. Think about any variable for which you would like to know the mean value in the population but you don't have any data at the population level. Assuming that you have enough budget to collect data only from a sample of 25, please outline and describe the steps you would follow to calculate the 90% confidence interval for the population mean you are interested in.

7. A pharmaceutical company wanted to estimate the population mean of monthly sales for their sales people. Forty sales people were randomly selected. Their mean monthly sales was $10,000 with a population standard deviation of $1000. Construct and interpret a 90% confidence interval for the mean sales of all the sales people.

8. Mileage tests were conducted on a randomly selected sample of 100 newly developed automobile tires. The results showed that the average tread life for the sample was 50,000 miles with a sample standard deviation of 3,500 miles.

a. What is the best estimate of the average tread life in miles for the entire population of these tires?

b. Please construct and interpret 90% confidence interval for the tread life of the tires.

9. A random sample of 16 ATM transactions at the Last National Bank of Flatrock revealed a mean transaction time of 2.8 minutes with a sample standard deviation of 1.2 minutes. Please construct and interpret the 90% confidence interval for the true (population) mean transaction time.

10. A manufacturer of stereo equipment introduces new models in the fall. Retail dealers are surveyed immediately after the Christmas selling season regarding their stock on hand of each piece of equipment. It has been discovered that unless 40% of the new equipment ordered by the retailers in the fall had been sold by Christmas, immediate production cutbacks are needed. The manufacturer has found that contacting all of the dealers after Christmas by mail is frustrating as many of them never respond. This year 80 dealers were selected at random and telephoned regarding a new receiver. It was discovered that 38% of those receivers had been sold. Construct 99% confidence interval for proportion of sales for all the dealers of this receiver? Based on your result is it likely that production cutbacks will be needed?

11. A random sample of 160 commercial customers of PayMor Lumber revealed that 32 had paid their accounts within a month of billing. Please construct and interpret the 90% confidence interval for the population proportion of customers who pay within a month.

12. Think about a claim often made by a company or government agency or any other institution.

a. Describe the steps you would follow to test the validity of the claim.

b. What are Type I and Type II errors you could commit while conducting hypothesis test about the claim?

c. If you choose 1% level of significance to conduct your hypothesis test, please explain the relationship between this level of significance and type I error.

13. During emissions testing period, a group of 10 vehicles using an 85% ethanol-gasoline mixture showed mean CO2 emissions of 240 pounds per 100 miles, with sample standard deviation of 20 pounds. Another group of 14 vehicles using regular gasoline showed mean CO2 emissions of 252 pounds per 100 miles with sample standard deviation of 15 pounds. Assuming unequal population variances please test whether the emission rates are statistically the same for cars using ethanol-gasoline mixture and regular gasoline. Conduct the test at 1% level of significance and explain your decision.

14. The average cost of tuition, room and board at small private liberal arts colleges is reported to be $8,500 per term, but a financial administrator believes that the average cost is higher. A survey of 36 small liberal arts colleges showed that the average cost per term is $8,745 with a sample standard deviation of $1,200. At 5% level of significance is the financial administrator's belief supported by the evidence?

15. The national average gross annual income of certified welders is $30,000. The ship building association believes that their welder's on average earn at least as much as the national average. A survey of 25 welders in the ship building industry revealed an average annual income of $32,000 with sample standard deviation of $2,000. At 1% level of significance, is the ship building association right? Show all your steps and explain your decision.

#### Solution Preview

Please see the attachments.

1. A simple linear regression model relating investment (y) by companies to bank lending interest rate (x) is stated as

Error term the is where y = βo+β1x+ε where is ε is the error term

Answers

a. What are the intercept and slope for the relationship between investment and interest rate stated above? What sign would you expect for the slope in the relationship for investment and interest rate? Please explain your reasoning.

Intercept = βo

Slope = β1 (interest rate)

A positive sign is expected for the slope, since the investment normally increases as the interest rate increases.

b. What is the role of the error term in the simple linear regression model like the one stated above? Please list as many examples as you can of the factors that you think belong in the error term of the simple linear regression model stated above briefly justifying each example.

The error term stands for the observed errors or the residuals from fitting the estimated regression line βo+β1x to a set of n points. It is the only random component in the model, and thus, the only source of randomness in y. In other words, the error term is the difference between Actual observed values of Y (y) and the expected value.

c. Please explain how you would estimate the relationship between investment and interest rate specified above. [Note: you do not need to write any formulas as an answer to this question. Just explain what you will need and the reasoning behind the method you would use to estimate the relationship].

The relationship between investment and interest rate can be assessed by testing the slope (β1) at a required level of significance. The significance of the slope is tested using t test.

The reasoning behind the method is to check whether the interest rate has a significant effect on the investments.

d. For what purpose could you use the estimated simple linear regression model for investment and interest rate? Why could the simple linear regression model stated above be inadequate for the purpose?

The estimated simple linear regression model for investment and interest rate can be used to predict the investment at a particular interest rate within the range.

The simple linear regression model is inadequate when there is no linear relationship exists between investment and interest rate.

2. John Cooper, a real estate appraiser in small town in Pennsylvania, has estimated the following simple linear regression model relating home prices (y) in the town with the square footage (x) for the homes.

Ῠ=85,473+52.65X

(620) (5.22)

n=36 SST=865,500 SSR=545,328 SSE=320,272

Where the numbers in the parenthesis are the standard errors.

SST= total sum of squares SSR=Regression (explained) sum of squares

SSE=Error (residual) sum of squares

Answers

a. Does the relationship between home prices and square footage reported above make sense? Please interpret the slope of the simple linear regression model estimated by John.

Since the regression coefficient (slope) is positive, there is a positive relationship exists between home price and square footage. That is, the price will increase as the square footage increases. Hence the relationship between home prices and square footage reported above make sense.

Interpretation of slope:

For a unit increase in the square footage for the homes, the home price increases by 52.65 units.

b. At 1% level of significance, test whether square footage has a significant effect on home price. Please clearly show all your steps and comment on whether your decision to reject or not to reject the null hypothesis makes sense.

The significance of the regression coefficient square footage is tested using the t test.

The null hypothesis tested is

H0: β = 0

The alternative hypothesis is

H1: β ≠ 0

Test statistic: = 10.0862

Decision rule: Reject the null hypothesis if the absolute value of test statistic is greater than the critical value with 34 d.f. at the significance level 0.01.

Critical value = ±2.728394364

Conclusion: Reject the null hypothesis, since the absolute value of test statistic is greater than the critical value. The sample provides enough evidence to conclude that square footage has a significant effect on home price.

c. Calculate and interpret the coefficient of determination (R2) for home price and square footage. Does the magnitude of R2 make sense for the linear regression reported above?

Coefficient of determination, R2 = SSR/SST = 545,328/ 865,500 = 0.63

Interpretation:

63% of the total variation in Home price can be explained by the linear relationship between the Square footage and Home price (as described by the regression equation).

Since the suggested regression model is able to explain a reasonable portion (63%) of the variability in the dependent variable, we can conclude that the magnitude of R2 make sense for the linear regression.

d. Calculate the correlation coefficient (r) and conduct a hypothesis test involving a null hypothesis which says there is no correlation between home price and square footage (r=0) against an alternative hypothesis which says there is correlation between home price and square footage ( r≠0). Conduct the test at 1% level of significance.

The significance of correlation coefficient is tested using a student t test.

The null hypothesis tested is

H0: There is no significant correlation between home price and square footage. (ρ = 0)

The alternative hypothesis is

H1: There is significant correlation between home price and square footage. (ρ ≠ 0)

The test statistic is , where r = = 0.79377125, n = 36

Therefore, = 7.609860389

Decision rule: Reject the null hypothesis if the absolute value of calculated t is greater than the critical value of t with 34 d.f. at the 0.01 significance level.

Critical value = ±2.728394364

Conclusion: Reject the null hypothesis, since the absolute value of calculated t is greater than the critical value. The sample provides enough evidence to support the claim that there is correlation between home price and square footage.

Details

Two Tailed

Sample Size 36

Sample Correlation 0.79377125

Significance level 0.01

d.f. 34

T statistic 7.609860389

Critical Value 2.728394364

P value 7.66406E-09

e. Did you arrive at the same conclusion in b and d? Given the results of your hypothesis tests and the magnitude of the R2, do you think is it reasonable to use John's model for predicting home prices in the town?

Yes. The conclusion in b and d are the same.

We have concluded that square footage has a significant effect on home price. Also 63% of the total variation in Home price can be explained by the linear relationship between the Square footage and Home price. Hence it is reasonable to use John's model for predicting home prices in the town.

f. Use the model to predict the selling price of a house with a square footage of 2000.

Predicted selling price = 85,473 + 52.65 * 2000 = $190773

g. A house with 2000 square feet recently sold for $220,000. Explain why there is a difference between the price predicted by the model for such a house and the actual price at which the house was sold.

Prediction near is more accurate than further away. As the value deviates further away from the mean, the standard error of prediction increases and the accuracy of prediction decreases.

The standard error for prediction of y at x is

Here 2000 square feet might be further away from the average value of square footage.

h. If you were to estimate a multiple regression model for home prices what other variables might you include in the model? Please briefly justify why you think each of ...

#### Solution Summary

A simple linear regression, multiple regression models and CL are examined.