# Linear Regression and Correlation

A. What is linear regression?

b. What can linear regression do for you - both in a general business sense and specifically to your place of employment, or circle of influence?

c. What are some of the limitations of regression analysis?

a. What is correlation analysis and why is it important to us when we are using regression analysis?

b. Provide a definition of R and R squared? What is the relationship between the two numbers?

c. What is the equation for a straight line? Give a brief example of each of the variables involved.

a.What is the difference between a strong negative and a strong positive R?

b. What does zero correlation tell you? What about a correlation of positive or negative one?

c. What is the relationship between the independent and dependent variable? Can the independent and dependent variables be interchanged?

#### Solution Preview

Please see response attached for best formatting (also below), including examples as well. I hope this helps and take care.

RESPONSE:

1. a. What is linear regression?

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model.

Before attempting to fit a linear model to observed data, a modeler should first determine whether or not there is a relationship between the variables of interest. This does not necessarily imply that one variable causes the other (for example, higher SAT scores do not cause higher college grades), but that there is some significant association between the two variables. A scatterplot can be a helpful tool in determining the strength of the relationship between two variables. If there appears to be no association between the proposed explanatory and dependent variables (i.e., the scatterplot does not indicate any increasing or decreasing trends), then fitting a linear regression model to the data probably will not provide a useful model. A valuable numerical measure of association between two variables is the correlation coefficient, which is a value between -1 and 1 indicating the strength of the association of the observed data for the two variables.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). (http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm).

Some definitions are as follows:

· A statistical procedure for predicting the value of a dependent variable from an independent variable when the relationship between the variables can be described with a linear model. A linear regression equation can be written as Yp= mX + b, where Yp is the predicted value of the dependent variable, m is the slope of the regression line, and b is the Y-intercept of the regression line. In Microsoft Excel, the LINEST function is used to perform linear regression.

aa.uncw.edu/ward/chm255/glossary.htm

· The relation between variables when the regression equation is linear: e.g., y = ax + b

wordnet.princeton.edu/perl/webwn

· A statistical technique used to find the best-fitting linear relationship between a target (dependent) variable and its predictors (independent variables).

www.cs.ualberta.ca/~zaiane/courses/cmput690/glossary.html

· The process of finding the equation of a straight line that best fits the data.

highered.mcgraw-hill.com/sites/0072480823/student_view0/glossary.html

· In statistics, linear regression is a method of estimating the conditional expected value of one variable y given the values of some other variable or variables x. The variable of interest, y, is conventionally called the "dependent variable".

The terms "endogenous variable" and "output variable" are also used. The other variables x are called the "independent variables". The terms "exogenous variables" and "input variables" are also used. The dependent and independent variables may be scalars or vecen.wikipedia.org/wiki/Linear_regression

(http://66.102.7.104/search?q=cache:XNgvkk-DG_YJ:www.lsbu.ac.uk/psycho/teaching/ppfiles/rm3-7-04-05.ppt+limitations+of+regression+analysis&hl=en)

b. What can linear regression do for you - both in a general business sense and specifically to your place of employment, or circle of influence?

i) The international rice research institute in the Philippines wants to relate the grain yield of rice varieties, y, to the tiller number, x . They conducted experiments for some rice varieties and tillers (see example attached).

ii) Participatory style management (x) predicts employee behavior (y)

iii) A trendline shows the trend in a data set and is typically associated with regression analysis. Creating a trendline and calculating its coefficients allows for the quantitative analysis of the underlying data and the ability to both interpolate and extrapolate the data for forecast purposes. It is probably best to illustrate the problem with a simple example.

Consider monthly sales as shown in Table 1 Month Sales

1 3100

2 4500

3 4400

4 5400

5 7500

6 8100

Table 1 SEE ATTACHED ARTICLE

From http://www.tushar-mehta.com/excel/tips/trendline_coefficients.htm

c. What are some of the limitations of regression analysis?

i) The above ...

#### Solution Summary

Based on the questions, this solution provides a detailed discussion of linear regression and correlation analysis. It also compares aspects of the independent and dependent variables.