# Statistics - Linear Regression Discussion Questions

1. Under what conditions would you use correlation and/or regression analysis? Include comments on the type of data needed and a work-related suggestion for their use.

2. During the years 1790 to 1820, the correlation between the number of churches built in New England and the barrels of Rum imported into the region was a perfect 1.0. What does this tell you - that church building causes rum drinking, that rum drinking causes church building, or something else? If something else, what?

3. Political science question (from "How to Think About Statistics" 5th edition by John L. Phillips, Jr. W. H. Freeman and Company, New York 1996): Researchers have frequently asked whether there is any relationship between the amount of domestic conflict within a given country (X) and the amount of foreign conflict which that country initiates (Y). Assume you have constructed a conflict scale and collected data for 50 countries on the values of both X and Y. What statistic will answer your question?

Multiple Regression:

1. I used multiple regression analysis to examine a company's pay practices with respect to possible sex-based salary discrimination. How would you do this? What steps would you take? What variables would you want to consider?

2. In the mid 60's, the Department of Education had a study performed on educational achievements of students. The researcher entered the variables into the equation according to a time-based theory of the impact of variables. This means he entered the variables in the order a person would run across them in real time. So, the first two variables entered were race and sex. The study concluded that the educational system discriminated among students on the basis of race. As a student of statistics, what comments might you make about the study's entering of variables?

3. One issue that many companies are now facing is in determining what the best production practices are for their products. This often involves examining not only the quality and specs of incoming raw materials but also the process variables such as how long to heat something, what temperature to use, etc. If you were charged with maximizing the effectiveness of a manufacturing process, how might you go about the task? (Assume you have all of the needed measurements on the different variables involved in the process.)

#### Solution Preview

Statistics - Linear Regression Discussion Questions

1. Under what conditions would you use correlation and/or regression analysis? Include comments on the type of data needed and a work-related suggestion for their use.

Correlation and regression analysis are both used to investigate relationships between two variables (x and y). You assume that the variables are related linearly (if they're not you can often transform them so they are) and that the residuals (predicted minus observed values) are distributed normally.

You use correlation analysis to test the statistical significance of the association of the two variables. The closer the correlation coefficient is to 1 (or -1), the stronger the association. If the correlation coefficient is 0, the two variables are independent of one another - a change in one is not associated with a change in the other. You use regression analysis to describe the relationship between the two variables by means of an equation that has predictive value. The residuals are calculated from the regression line

Note that variables can be strongly related, but have a coefficient of 0. This can happen if they are not linearly related. Also, a strong correlation does not imply a causal relationship between the variables.

Examples of variables used in correlation and regression analysis are in the following questions.

2. During the years 1790 to 1820, the correlation between the number of churches built in New England and the barrels of Rum imported into the region was a perfect 1.0. What does this tell you - that church building causes rum drinking, that rum drinking causes church building, or something else? If something else, what?

As was mentioned in the previous question, a correlation of 1 means that there is a very strong association between the two variables. In this case, as the number of churches increased, ...

#### Solution Summary

The solution consists of answers to six questions involving regression analysis. Topics discussed include: linear regression, correlation, causation, and multiple regression.