# Introduction to Linear Regression and Correlation Analysis.

1. The following information taken from the 1998 annual report of Baldor Electric Company shows net Sales and Working capital (in thousand dollars) for 1988 and 1998.

[see attachment]

a. Plot the variables Net Sale (y) and Working Capital (x) in scatter-plot format. What type of relationship appears to exist between Working Capital and Net Sales? Indicate whether a regression model or a correlation model would be more appropriate. Give statistical reason for your answer.

b. Compute the correlation coefficient between Working Capital and Net Sales. What does the correlation coefficient measure?

c. Test to determine if when Net Sales declines, Working Capital will also decline. (Hint: Think what this indicates for the value of the population correlation coefficient.) Clearly state your null and alternative hypotheses. Conduct your test at a significance level of 0.05. Be sure to state a conclusion for your test.

2. One of the editors of major automobile publications has collected data on 30 of the best selling cars in United States. See attached file Automobiles. The editor is particularly interested in the relationship between highway mileages and curbs weight of the vehicles.

a. Develop a scatter plot for these data. Discus what the plot implies about relationship between two variables. Assume that you wish to predict highway mileage by using vehicle curb weight.

b. Compute the correlation coefficient for the two variables and test to determine whether there is a linear relationship between the curb weight and the highway mileage of automobiles.

c. 1. Compute the linear regression equation based on sample data. 2. Cadillac's 1999 Sedan DeVille weight approximately 4,012 pounds. Provide an estimate of the average highway mileage you would expect to obtain from this model.

3. Referring again to the automobile magazine editor discussed in exercise 2, the editor now wants to examine the relation between price of the vehicle and horsepower of engine.

a. 1. Develop a scatter plot for these data. 2. Discuss what the plot implies about the relationship between the two variables. Use the price as the depend (y) variable.

b. Compute the correlation coefficient for the two variables.

c. Compute the linear regression equation based on the sample data.

d. Toyota's 1999 Camry four-cylinder model generates 133 horsepower. Provide estimate of the price of the 1999 Camry. Toyota's suggested retail price for the Camry LE 4A model was $20,278. Calculate the appropriate residual for this model of Camry.

e. 1. Compute the R-squared value and discuss what this value means. 2. At a significance level of 0.01, can you conclude that engine horsepower is a good predictor of the price of an automobile?

4. A 1998 articles in Fortune magazines titled The 100 Best Companies to Work in America (January 12, 1998) contained data on the 100 companies. See attached file Best Companies.

a. Compute the linear regression equation based on the sample data if the revenue of each company is to be used to predict the number of hours of training per year per employee.

b. Would you feel comfortable using the revenue of one of the 100 companies to determine the number of hours of training per year per employee with a simple regression model? Conduct a statistical procedure to answer this question.

c. Synovus Financial has 8,827 employees. Predict the number of hours of training per year per employee for Synovus.

d. Referring to part c, develop and interpret a 90% prediction interval for the average training hours per employee for the companies with 8,827 employees.

e. Referring part d, what is a 90% prediction interval for average training hours per employee for companies with 40,000 employees? Compare this interval with the one computed in part d and discuss why the widths of the two are different.

f. Referring part d and e, at what number of employees would width of 90% prediction interval for average training hours be minimized?

g. Referring to part d and e develop and interpret a 90% prediction interval for the actual training hours per employee for Synovus.

#### Solution Preview

See the attached Word document and Excel spreadsheets. I did the work using Excel and SPSS. Let me know if you have any other questions.

Introduction to Linear Regression and Correlation analysis.

1. The following information taken from the 1998 annual report of Baldor Electric Company shows net Sales and Working capital (in thousand dollars) for 1988 and 1998.

Year Net Sales Working Capital

1988 243463 67168

1989 281462 69788

1990 294030 75306

1991 286495 84740

1992 318930 97343

1993 356595 108601

1994 418152 118550

1995 473103 145069

1996 502875 146975

1997 557940 141268

1998 589406 176126

a. Plot the variables Net Sale (y) and Working Capital (x) in scatter-plot format. What type of relationship appears to exist between Working Capital and Net Sales? Indicate whether a regression model or a correlation model would be more appropriate. Give statistical reason for your answer.

I made the scatterplot in Excel and pasted it below:

It looks as if there is a linear relationship between the two variables. I think we should to a regression analysis on this data to look at the relationship between the two variables. (I will do some of the rest of these analyses in SPSS because I think that program is easier to work with. You can use Excel or any other statistical software program, and you should get the same answers. The SPSS output will be in black-and-white, while the Excel output is in color.) When making a regression model, you have the following assumptions:

* the variables are related linearly (if they're not you can often transform them so they are)

* the residuals (predicted minus observed values) are distributed normally

* the residuals are independent

* the residuals have a constant variance

b. Compute the correlation coefficient between Working Capital and Net Sales. What does the correlation coefficient measure?

The correlation coefficient (r) measures the degree of correlation between two variables and can range from -1 to 1. Here is the SPSS output from calculation r:

The correlation coefficient is r = 0.971, which indicates a strong positive correlation between the two variables.

c. Test to determine if when Net Sales declines, Working Capital will also decline. (Hint: Think what this indicates for the value of the population correlation coefficient.) Clearly state your null and alternative hypotheses. Conduct your test at a significance level of 0.05. Be sure to state a conclusion for your test.

This means that we're testing to see if there is a positive correlation. The null hypothesis is that there is a negative or 0 correlation (r ≤ 0). The alternative hypothesis is that there is a positive correlation (r > 0).

The SPSS output shows that the correlation is significant at the 0.01 level, which means that it's significant at the 0.05 level as well. If you wanted to test this by hand, you would have to convert the correlation coefficient to a t-score, then compare that ...

#### Solution Summary

This problem set has four multi-part questions on regression and correlation analyses.