# Statistics - Linear Regression Discussion Questions

1. Under what conditions would you use correlation and/or regression analysis? Include comments on the type of data needed and a work-related suggestion for their use.

2. During the years 1790 to 1820, the correlation between the number of churches built in New England and the barrels of Rum imported into the region was a perfect 1.0. What does this tell you - that church building causes rum drinking, that rum drinking causes church building, or something else? If something else, what?

3. Political science question (from "How to Think About Statistics" 5th edition by John L. Phillips, Jr. W. H. Freeman and Company, New York 1996): Researchers have frequently asked whether there is any relationship between the amount of domestic conflict within a given country (X) and the amount of foreign conflict which that country initiates (Y). Assume you have constructed a conflict scale and collected data for 50 countries on the values of both X and Y. What statistic will answer your question?

Multiple Regression:

1. I used multiple regression analysis to examine a company's pay practices with respect to possible sex-based salary discrimination. How would you do this? What steps would you take? What variables would you want to consider?

2. In the mid 60's, the Department of Education had a study performed on educational achievements of students. The researcher entered the variables into the equation according to a time-based theory of the impact of variables. This means he entered the variables in the order a person would run across them in real time. So, the first two variables entered were race and sex. The study concluded that the educational system discriminated among students on the basis of race. As a student of statistics, what comments might you make about the study's entering of variables?

3. One issue that many companies are now facing is in determining what the best production practices are for their products. This often involves examining not only the quality and specs of incoming raw materials but also the process variables such as how long to heat something, what temperature to use, etc. If you were charged with maximizing the effectiveness of a manufacturing process, how might you go about the task? (Assume you have all of the needed measurements on the different variables involved in the process.)

© BrainMass Inc. brainmass.com October 16, 2018, 6:53 pm ad1c9bdddfhttps://brainmass.com/statistics/regression-analysis/statistics-linear-regression-discussion-questions-98102

#### Solution Preview

Statistics - Linear Regression Discussion Questions

1. Under what conditions would you use correlation and/or regression analysis? Include comments on the type of data needed and a work-related suggestion for their use.

Correlation and regression analysis are both used to investigate relationships between two variables (x and y). You assume that the variables are related linearly (if they're not you can often transform them so they are) and that the residuals (predicted minus observed values) are distributed normally.

You use correlation analysis to test the statistical significance of the association of the two variables. The closer the correlation coefficient is to 1 (or -1), the stronger the association. If the correlation coefficient is 0, the two variables are independent of one another - a change in one is not associated with a change in the other. You use regression analysis to describe the relationship between the two variables by means of an equation that has predictive value. The residuals are calculated from the regression line

Note that variables can be strongly related, but have a coefficient of 0. This can happen if they are not linearly related. Also, a strong correlation does not imply a causal relationship between the variables.

Examples of variables used in correlation and regression analysis are in the following questions.

2. During the years 1790 to 1820, the correlation between the number of churches built in New England and the barrels of Rum imported into the region was a perfect 1.0. What does this tell you - that church building causes rum drinking, that rum drinking causes church building, or something else? If something else, what?

As was mentioned in the previous question, a correlation of 1 means that there is a very strong association between the two variables. In this case, as the number of churches increased, ...

#### Solution Summary

The solution consists of answers to six questions involving regression analysis. Topics discussed include: linear regression, correlation, causation, and multiple regression.

Correlation and causality, parametric and non-parametric data, simple and multiple linear regression

1. Does correlation equal causality? Why or why not? What is the difference between strong positive and strong negative correlation?

2. How would you determine if this correlation is real in the population? Is there value in a strong negative correlation? If so, what? What does zero correlation tell you?

3. What does r tell you? What does r2 tell you? What does 1- r2 tell you? What does it mean?

Label each of the following situations "P" if it is an example of parametric data or "NP" if it is an example of nonparametric data.

4. In a comparison of two towns, is the average height of its residents the same? _____

5. A manufacturer produces a batch of memory chips (RAM) and measures the mean-time-between-failures (MTBF). The manufacturer then changes a manufacturing process and produces another batch and again measures the MTBF. Did the change to the process improve the MTBF? _____

6. The average life span of a dog is proportional to the amount of calcium consumed. _____

7. The correlation between gene disorders and certain diseases. _____

8. From a written survey where the respondents were asked to rate an individual on a scale of 1 to 5, one group rated an individual a 3.7, another group rated the individual a 4.3. Is the difference statistically significant? _____

9. A study of vehicle accidents on a military installation compared to drivers' rank. _____

10. Show that the numbers drawn in a state lottery are truly random. _____

11. There is a direct correlation to a student's grade and the student's rating of an instructor. _____

12. What assumptions are required in using the multiple regression model?

13. In simple linear regression, the regression equation is a straight line. In multiple regression, what geometric form is taken by the regression equation when there are two independent variables? When there are three or more independent variables?