# Correlation and Regression Questions

In this problem set you will get some practice performing a linear regression analysis. If you use Statdisk or Excel to perform any portion of these analyses, please include the results, label them, and refer to them accordingly in your interpretations.

Listed below are the overhead widths (in cm) of seals measured from photographs and the weights of the seals (in kg). The data are based on "Mass Estimation of Weddell Seals Using Techniques of Photogrammetry," by R. Garrott of Montana State University. The goal of the study is to explore the relationship between the overhead widths and the weights of the seals and to determine whether there is enough evidence to conclude that it is reasonable to use the overhead width to estimate the weight of a seal.

Overhead Width (cm) 7.2 7.4 9.8 9.4 8.8 8.4

Weight (kg) 116 154 245 202 200 191

1. Find the correlation coefficient and the critical value of r at the 5% significance level. Is there sufficient evidence to conclude that there is a linear relationship between the overhead width and the weight of the seals? Explain this using the value of the correlation coefficient and the critical value of r.

2. Find the explained variation. Explain the meaning of the explained variation in the context of this situation.

3. Find the unexplained variation. Explain the meaning of the unexplained variation in the context of this situation.

4. Find the total variation. Demonstrate and explain the relationship between the explained variation, the unexplained variation, and the total variation in the context of this situation.

5. Find the coefficient of determination. Demonstrate and explain the relationship between the explained variation, the total variation, and the coefficient of determination in the context of this situation. Explain the meaning of the coefficient of determination in the context of this situation.

6. Find the standard error of estimate.

7. Write the equation for the regression line. Explain the meaning of the slope of this line in the context of this situation. Find the predicted weight in kg of a seal given that the width from an overhead photograph is 9.0 cm.

8. Use the prediction interval spreadsheet to find both a 95% prediction interval estimate and a 95% confidence interval estimate of the weight in kg of a seal given that the width from an overhead photograph is 9.0 cm. Explain the meaning of these interval estimates in the context of this situation. Explain the difference between a 95% prediction interval estimate and a 95% confidence interval estimate for any given situation.

© BrainMass Inc. brainmass.com October 25, 2018, 8:32 am ad1c9bdddfhttps://brainmass.com/statistics/regression-analysis/correlation-and-regression-questions-543779

#### Solution Summary

This solution is comprised of a detailed explanation for testing of correlation coefficient using t test and finding the confidence as well as prediction interval from regression model. The data analysis tool is used for the regression analysis in excel, correlation coefficient is tested manually by showing all the formulas and calculations. Full interpretation is given for regression analysis.

Correlation, Regression and Normal Distribution Practice Questions

a. The headline of a January 31, 2005 USA Today article read, "'January Barometer' predicts a pretty lousy year." Referring to the stock market (and in particular the S&P 500), the article goes on to say that the month of January was "turning out to be a loser and chances are 2005 will be, too." To support the claim, the article presents S&P 500 performance data for the past 10 years. Besides the year, the columns are the S&P 500 returns for the month of January and for the entire year.

As you can see, increases (shown in green) in January are typically associated with increases for the full year, while decreases (shown in red) in January are typically coupled with decreases for the full year. There is, however, a possibility that the association that is seen in the data is due to random chance. What could we do to see if the asserted relationship between January returns and the Full Year's returns is real (i.e., not due only to sampling error)? Using cell Q5, indicate your answer using the number associated with the best choice below.

1. Use Excel's regression analysis tool to estimate the relationship between the full year's returns (the dependent variable) and January's returns (the independent variable). If Significance F value is large (bigger than .05, say), the relationship is real.

2. Use Excel's regression analysis tool to estimate the relationship between the full year's returns (the dependent variable) and January's returns (the independent variable). If the Lower 95% value associated with January's returns is negative and the upper 95% value is positive, the relationship is real.

3. Draw a scatter diagram with January's returns on the x-axis and the full year's returns on the y-axis and use Excel's Add Trendline feature to estimate the relationship between the two variables. If the slope of the line is not zero, then the relationship is real.

4. Use Excel's regression analysis tool to estimate the relationship between the full year's returns (the dependent variable) and January's returns (the independent variable). If the coefficient associated with January returns is not zero, then the relationship is real.

5. Use Excel's regression analysis tool to estimate the relationship between the full year's returns (the dependent variable) and January's returns (the independent variable). If the p-value associated with January returns is less than our level of significance, the relationship is real.

b. When analyzing the slope in a regression analysis (i.e., the relationship between the dependent variable and one of the independent variables), which of the following would be a Type II error? Indicate your answer in cell Q11.

1. To conclude that the slope (relationship) is not significant when it really is.

2. To conclude that the slope (relationship) is significant when it really is.

3. To conclude that the slope (relationship) is not significant when it really is not.

4. To conclude that the slope (relationship) is significant when it really is not.

5. There cannot be a Type II error in this situation.

c. Which of the following is **not** something we can learn from a scatter plot? Give your answer in cell Q17.

Whether or not there are outliers in the data.

Whether or not there is any relationship between the two variables.

Whether or not there is a curved relationship between the two variables.

Whether or not there is a causal relationship between the two variables.

All of the above can be learned from a scatter plot.

d. The weekly demand for a particular automobile manufacturer follows a normal distribution with a mean of 50,000 cars and a standard deviation of 10,000. There is a 2% chance that this company will sell more than what number of cars during the next week? Report your answer as an integer. Put your answer in cell S23.

e. An automotive repair shop has determined that the average service time on an automobile is 1.5 hours with a standard deviation of 35 minutes. A random sample of 70 services is selected. What is the probability of finding a sample mean of 96 minutes or larger if the population mean is still 1.5 hours? Give your answer to 4 decimal places. Put your answer in cell S24.

f. A news account of a nationwide survey taken by Lou Harris (a well known and reputable opinion polling organization) says 25% of the 1,604 persons responding named the Democrats as the best able to handle the nation's problems. The news report does not give a margin of error. Based on the information available, compute the margin of error (for 95% confidence). If you believe that there is not enough information to compute the margin of error, enter 0 as your answer. Otherwise, give your answer to 4 decimal places. Put your answer in cell S25.

g. Carpetland salespersons have averaged $8000 per week in sales. Steve Conois, the firm's vice president, proposes a compensation plan with new selling incentives. Steve hopes that the results of a trial selling period will enable him to conclude (prove) that the compensation plan increases the average sales per salesperson. Which of the following would be a Type I error? Enter your answer in cell S26.

1. To conclude that the average sales per salesperson have not increased when they really have.

2. To conclude that the average sales per salesperson have not increased when they really have not.

3. To conclude that the average sales per salesperson have increased when they really have.

4. To conclude that the average sales per salesperson have increased when they really have not.

5. There cannot be a Type I error in this situation.

h. Consider two experiments. First, from a population that is normally distributed with mean 10, we select one item and find its weight. Let D1 be the distribution of possible outcomes from experiment 1. Second, we take a sample of 5 items from the same population and calculate the average weight of the 5 items. Let D2 be the distribution of possible outcomes (sample averages) resulting from experiment 2. Which of the following statements is true and which is not true? Use cells Q34:Q38 to make your selections.

1. D1 and D2 have the same mean.

2. D1 and D2 are both normally distributed.

3. D1 is wider than D2

4. D1 is narrower than D2

5. D1 and D2 have the same spread (standard deviation)

i. Suppose that we have sampled n observations from a normal distribution and found a 99% confidence interval for a population mean. If the sample size decreases and the confidence level decreases from 99% to 95%, indicate whether the interval will definitely get wider, narrower, or will not be definite either way. Assume that the sample standard deviation remains constant with the new sample size. Use cell Q40.