# Statistics: linear and multilinear regressions

The personnel director for a local manufacturing firm has received complaints from the employees in a certain shop regarding what they perceive to be inequities in the annual salary for employees who have similar performance ratings, years of service and relevant certifications. The personnel director believes that an employee's pay in this particular shop should be positively correlated to their prior performance rating, years of service and relevant certifications. The personnel director has collected the data shown in the following table pertaining to the employees within the shop.****REQUEST THAT ANSWERS BE IN EXCEL FORMAT IN AN ATTACHMENT TO SHOW FORMULAS USED*****

Week Three Homework Assignment - Linear Regression

The personnel director for a local manufacturing firm has received complaints from the employees in a certain shop regarding what they perceive to be inequities in the annual salary for employees who have similar performance ratings, years of service and relevant certifications. The personnel director believes that an employee's pay in this particular shop should be positively correlated to their prior performance rating, years of service and relevant certifications. The personnel director has collected the data shown in the following table pertaining to the employees within the shop.

Employee Current Annual Salary

(Thousands) Average Performance Rating for Past 3 Years

(5 point scale) Years of Service Number of Relevant Certifications

1 48.2 2.18 9 6

2 55.3 3.31 20 6

3 53.7 3.18 18 7

4 61.8 3.62 33 7

5 56.4 2.62 31 8

6 52.5 3.75 13 6

7 54.0 4.25 25 6

8 55.7 3.43 30 4

9 45.1 1.93 5 6

10 67.9 4.5 47 8

11 53.2 2.81 25 5

12 46.8 3.06 11 6

13 58.3 5 23 8

14 59.1 4.06 35 7

15 57.8 4.12 39 5

16 48.6 2.31 21 4

17 49.2 3.87 7 6

18 63.0 4.37 40 7

19 53.0 2.5 35 6

20 50.9 2.81 23 4

21 55.4 3.68 33 5

22 51.8 3.5 27 4

23 60.2 3 34 8

24 50.1 2.43 15 5

The personnel director is interested in creating a linear regression model that can be used to estimate the annual salary an employee might expect to receive based upon his or her past performance, years of service and/or number of relevant certifications. The regression model will be used as a basis for determining whether or not there is any validity to the employees' complaints regarding salary inequities.

Perform each of the following seven regression analyses using a 95% confidence level.

• Annual salary vs. average performance rating for the past 3 years

• Annual salary vs. years of service

• Annual salary vs. number of relevant certifications

• Annual salary vs. average performance rating for the past 3 years and years of service

• Annual salary vs. average performance rating for the past 3 years and number of relevant certifications

• Annual salary vs. years of service and number of relevant certifications

• Annual salary vs. average performance rating for the past 3 years, years of service and number of relevant certifications

Hint: Refer to the handouts posted on Blackboard pertaining to interpreting regression statistics in order to determine if a given regression model is acceptable. This same handout also provides guidance regarding how to select a preferred regression model from amongst multiple acceptable regression models, including models with differing numbers of independent variables.

Hint: For the purposes of this homework assignment, the minimum difference between the R2 or Adjusted R2 values for two acceptable models with differing numbers of independent variables that would favor selecting the model with the larger number of independent variables is 0.03. Please ensure that you fully understand the process for selecting a preferred model before attempting to apply this criterion.

Hint: Question 19 is intended to have you demonstrate that you understand how to determine which univariate models are acceptable, and then select a preferred univariate model from amongst the acceptable univariate models. Question 20 is intended to have you demonstrate that you understand how to determine which bivariate models are acceptable, and then select a preferred bivariate model from amongst the acceptable bivariate models. Question 21 is intended to have you demonstrate that you understand how to select a preferred model from amongst multiple acceptable models that have differing numbers of independent variables. Question 22 is intended to have you demonstrate that you understand how to determine if the trivariate model is acceptable, and then select a preferred model from amongst multiple acceptable models.

Hint: For questions 24 and 25, you need to use the regression equation associated with the preferred model selected for question 22 in order to calculate the predicted salary for each of the 24 employees. In order to answer questions 24 and 25 you need to keep in mind that the predicted salary value for each employee is only a point estimate (this concept was discussed in week one relative to the mean). While a point estimate is a precise value, it is not necessarily an accurate value since the standard error value tells us there is some potential degree of error associated with using the preferred regression model to predict salary values. In order to answer questions 24 and 25 you will need to create an interval estimate (this concept was also discussed during week one relative to the mean) for the predicted salary for each of the 24 employees. To calculate the interval estimate for each employee, simply multiply the standard error value for the preferred regression model by 1.5 and then subtract this value from the predicted point estimate salary value to define the lower limit of the interval estimate and add this value to the predicted point estimate salary value to define the upper limit for the interval estimate. Once you have created an interval estimate for each employee, you will then need to compare each employee's current salary to their corresponding interval estimate in order to determine if each employee's current salary falls within their predicted interval estimate.

Use the results for the univariate regression analysis for annual salary vs. average performance rating for the past 3 years in order to answer questions 1 through 14.

1. What is the degree of correlation between the dependent variable and the independent variable?

o 0.8198

o 0.6672

o 0.7862

o 0.6523

2. Does the regression model confirm a positive correlation between the dependent variable and the independent variable as hypothesized?

o Yes

o No

3. What is the desired statistical significance for the regression model?

o 0.00

o 0.01

o 0.05

o 0.10

4. Is the statistical significance of the model as a whole less than the desired statistical significance for the regression model?

o Yes

o No

5. What is the actual confidence level for the regression model as a whole?

o 99.97%

o 95%

o 90.5%

o 80.2%

6. Is the statistical significance of the linear relationship between the dependent and independent variables less than the desired statistical significance for the regression model?

o Yes

o No

7. Should the coefficient of determination or adjusted coefficient of determination be used to evaluate this regression model?

o Coefficient of determination

o Adjusted coefficient of determination

8. What percentage of the observed variation between the actual values of the dependent variable and the mean value of the dependent variable in the sample data set is explained by the regression model?

o 44.51%

o 66.72%

o 41.99%

o 76.24%

9. What is the amount by which we will be off on average when predicting values for the dependent variable using the regression model?

o $12,287

o $32,966

o $4,169

o $25,896

10. What is the coefficient for the y-intercept for the regression model?

o 39.38

o 19.94

o 12.29

o 2.96

11. What is the coefficient for the independent variable for the regression model?

o 63.08

o 4.52

o 12.29

o 2.96

12. What is the point estimate for the predicted salary for an employee with an average performance rating of 3.9?

o $71,562

o $57,006

o $41,299

o $50,896

13. What is the interval estimate for the predicted salary for an employee with an average performance rating of 3.9 based upon taking into consideration the standard error?

o $61,913 - $71,562

o $57,006 - $64,159

o $52,837 - $61,175

o $54,896 - $60,873

14. What is the 95% confidence level interval estimate for the salary for an employee with an average performance rating of 3.9?

o $60,265 - 85,789

o $75,412 - $78,523

o $40,636 - $73,376

o $65,141 - $72,269

Perform a correlation analysis between the dependent variable and each of the three independent variables. Use the results of the correlation analysis to answer questions 15 and 16.

15. Which independent variables evidence a positive correlation with the dependent variable?

o Average performance rating for the past 3 years

o Years of service

o Number of relevant certifications

o All of the above

o None of the above

16. Which independent variable evidences the highest degree of correlation with the dependent variable?

o Average performance rating for the past 3 years

o Years of service

o Number of relevant certifications

o All of the above

o None of the above

Perform a correlation analysis between each of the three pairs of independent variables. Use the results of the correlation analyses to answer question 17.

17. Which pair of independent variable evidences a degree of collinearity that should be cause for concern when performing multivariate linear regression (i.e., evidences a degree of correlation in excess of 0.5)?

o Average performance rating for the past 3 years vs. years of service

o Average performance rating for the past 3 years vs. number of relevant certifications

o Years of service vs. number of relevant certifications

o All of the above

o None of the above

Use the regression statistics pertaining to all seven regression analyses in order to answer questions 15 through 23.

18. Of the seven regression models, which model both accounts for the lowest percentage of the observed variation between the actual values of the dependent variable and the mean value of the dependent variable in the sample data set and evidences the highest degree of error for predicting values for the dependent variable?

o Annual salary vs. average performance rating for the past 3 years

o Annual salary vs. years of service

o Annual salary vs. number of relevant certifications

o Annual salary vs. average performance rating for the past 3 years and years of service

o Annual salary vs. average performance rating for the past 3 years and number of relevant certifications

o Annual salary vs. years of service and number of relevant certifications

o Annual salary vs. average performance rating for the past 3 years, years of service and number of relevant certifications

19. If you were to consider only the three regression models that are based upon a single independent variable, which of the following models would be your preferred model?

o Annual salary vs. average performance rating for the past 3 years

o Annual salary vs. years of service

o Annual salary vs. number of relevant certifications

20. If you were to consider only the three regression models that are based upon two independent variables, which of the following models would be your preferred model?

o Annual salary vs. average performance rating for the past 3 years and years of service

o Annual salary vs. average performance rating for the past 3 years and number of relevant certifications

o Annual salary vs. years of service and number of relevant certifications

21. If you were to compare the preferred regression model based upon a single independent variable with the preferred regression model based upon two independent variables, which model would be preferred overall?

o Preferred regression model based upon a single independent variable

o Preferred regression model based upon two independent variables

22. When you consider all seven regression models, which is the overall preferred regression model?

o Annual salary vs. average performance rating for the past 3 years

o Annual salary vs. years of service

o Annual salary vs. number of relevant certifications

o Annual salary vs. average performance rating for the past 3 years and years of service

o Annual salary vs. average performance rating for the past 3 years and number of relevant certifications

o Annual salary vs. years of service and number of relevant certifications

o Annual salary vs. average performance rating for the past 3 years, years of service and number of relevant certifications

23. Do any of the regression models offer a higher confidence level for the model as a whole, or a lower standard error in comparison to the overall preferred model?

o Yes

o No

The personnel is interested in comparing each employee's actual salary to their predicted salary in order to determine if there are any prevailing salary inequities. Suppose the personnel director considers an employee's current salary to be fair and reasonable if it is within plus or minus 1.5 standard errors of the value estimated by the regression model selected in response to question 22. For each individual employee, calculate his or her estimated salary using the regression model selected in response to question 22, as well calculate his or her upper and lower limits for a fair reasonable salary, in order to answer questions 24 and 25.

24. Of the 24 employees, how many employees' current salary is below what is considered fair and reasonable?

o 0

o 1

o 2

o 3

o 4

o 5

25. Of the 24 employees, how many employees' current salary is above what is considered fair and reasonable?

o 0

o 1

o 2

o 3

o 4

o 5

https://brainmass.com/math/probability/statistics-linear-and-multilinear-regressions-571022

#### Solution Preview

The personnel director for a local manufacturing firm has received complaints from the employees in a certain shop regarding what they perceive to be inequities in the annual salary for employees who have similar performance ratings, years of service and relevant certifications, The personnel director believes that an employee's pay in this particular shop should be positively correlated to their prior performance rating, years of service and relevant certifications, The personnel director has collected the data shown in the following table pertaining to the employees within the shop,

Employee Current Annual Salary

(Thousands) Average Performance Rating for Past 3 Years

(5 point scale) Years of Service Number of Relevant Certifications

1 48,2 2,18 9 6

2 55,3 3,31 20 6

3 53,7 3,18 18 7

4 61,8 3,62 33 7

5 56,4 2,62 31 8

6 52,5 3,75 13 6

7 54,0 4,25 25 6

8 55,7 3,43 30 4

9 45,1 1,93 5 6

10 67,9 4,5 47 8

11 53,2 2,81 25 5

12 46,8 3,06 11 6

13 58,3 5 23 8

14 59,1 4,06 35 7

15 57,8 4,12 39 5

16 48,6 2,31 21 4

17 49,2 3,87 7 6

18 63,0 4,37 40 7

19 53,0 2,5 35 6

20 50,9 2,81 23 4

21 55,4 3,68 33 5

22 51,8 3,5 27 4

23 60,2 3 34 8

24 50,1 2,43 15 5

The personnel director is interested in creating a linear regression model that can be used to estimate the annual salary an employee might expect to receive based upon his or her past performance, years of service and/or number of relevant certifications, The regression model will be used as a basis for determining whether or not there is any validity to the employees' complaints regarding salary inequities,

Perform each of the following seven regression analyses using a 95% confidence level,

• Annual salary vs, average performance rating for the past 3 years

• Annual salary vs, years of service

• Annual salary vs, number of relevant certifications

• Annual salary vs, average performance rating for the past 3 years and years of service

• Annual salary vs, average performance rating for the past 3 years and number of relevant certifications

• Annual salary vs, years of service and number of relevant certifications

• Annual salary vs, average performance rating for the past 3 years, years of service and number of relevant certifications

Hint: Refer to the handouts posted on Blackboard pertaining to interpreting regression statistics in order to determine if a given regression model is acceptable, This same handout also provides guidance regarding how to select a preferred regression model from amongst multiple acceptable regression models, including models with differing numbers of independent variables,

Hint: For the purposes of this homework assignment, the minimum difference between the R2 or Adjusted R2 values for two acceptable models with differing numbers of independent variables that would favor selecting the model with the larger number of independent variables is 0,03, Please ensure that you fully understand the process for selecting a preferred model before attempting to apply this criterion,

Hint: Question 19 is intended to have you demonstrate that you understand how to determine which univariate models are acceptable, and then select a preferred univariate model from amongst the acceptable univariate models, Question 20 is intended to have you demonstrate that you understand how to determine which bivariate models are acceptable, and then select a preferred bivariate model from amongst ...

#### Solution Summary

We had to compare several regressions

A hypothetical multiple regression (prediction)

Making Predictions Using Regression

Develop a hypothetical multiple regression (prediction) equation to predict something in your area of professional or personal interest. Please complete the following:

a. Identify the Variables: First you should identify the dependent (criterion) variable that you are interested in predicting. What variable do you plan to predict? Next, choose two variables (called independent) that you will use to predict your chosen dependent criterion variable. List and describe your two chosen independent predictor variables, and why you feel that they are appropriate for predicting your dependent variable. (Follow the example provided)

b. Quantify Your Predictor/Independent Variables: Fill out the following table to quantify each of your two independent or predictor variables. If you find that you have chosen a variable that cannot be quantified, select a different one and update your solution above. Fill out this table.

(Follow the example provided)

c. Correlation and Prediction: In order to use a certain variable to predict another variable, there must be a strong correlation between those two variables. Why is this true? Which of your two independent predictor variables do you think has the strongest correlation with your dependent criterion variable? Explain. (Follow the example provided)

d. Create the Prediction Multiple Regression Equation: Using your dependent criterion variable as "y," and your predictor independent variables as x1and x2, create a pretend multiple regression prediction equation. Next, choose any values for your independent variables x1and x2, and predict the corresponding value for the dependent variable "y." Use the standard equation Ŷ = a + b1x1 + b2x2. (Follow the example provided)

Follow this example

Help and Explanations

For example, I might be interested in predicting a person's BMI or Body Mass Index. Therefore, BMI would be my dependent criterion "y" variable that I will be predicting.

Next, I will choose two independent variables, (x1) hours of weekly exercise and (x2) ounces of meat eaten per week.

Predictor Name

Min value

Max value

1. hours of exercise

0

28

2. oz. meat eaten per week

0

84

Why did I choose these two predictor independent variables to predict my dependent variable of BMI?

In this case, I feel that each of these independent variables, exercise and meat are both highly correlated with BMI. If I had to guess, I would say that eating meat was the most correlated.

Next, I will create the multiple regression prediction equation, which will have the form:

Ŷ = a + b1x1 + b2x2

In my example, BMI is my "y" and is what I am predicting.

My x1 is an independent variable that represents the hours of weekly exercise and my x2 is an independent variable that represents ounces of meat eaten per week. The values of my constant "a," and my coefficients "b1" and "b2" are all numbers and are in fact weights for my independent variables.

For example, my regression prediction equation might be:

Ŷ = 25 - .3(x1) + .7(x2)

NOTE: The numbers you will use here are "invented." But, for the Assignment, you will use SPSS to generate the correct numbers.

Using my prediction equation:

Ŷ = 25 - .3(x1) + .7(x2)

I will use the following values

(1) The person exercises 6 hours per week (x1 = 6)

(2) The person eats 2 oz. of meat per week (x2 = 2)

Now, I will plug these values into my hypothetical regression equation to come up with the value of the BMI (dependent variable):

Ŷ = 25 - .3(6) + .7(2)

Now solving for y (BMI), I get a final value of: Ŷ = 24.6

In conclusion I predict a BMI of 24.6, assuming a person exercises 6 hours per week and eats 2 oz. of meat per week.

View Full Posting Details