The term regression was originally used in 1885 by Sir Francis Galton in his analysis of the relationship between the heights of children and parents. He formulated the "law of universal regression," which specifies that "each peculiarity in a man is shared by his kinsmen, but on average in a less degree." (Evidently, people spoke this way in 1885.) In 1903, two statisticians, K. Pearson and A. Lee, took a random sample of 1,078 father —son pairs to examine Galton's law ("On the Laws of Inheritance in Man, I. Inheritance of Physical Characteristics," Biometrika 2:457-462). Their sample regression line was.
Son's height = 33.73 + .516 × Father's height
a. Interpret the coefficients.
b. What does the regression line tell you about the heights of sons of tall fathers?
c. What does the regression line tell you about the heights of sons of short fathers?
Florida condominiums are popular winter retreats for many North Americans. In recent years, the prices have steadily increased. A real estate agent wanted to know why prices of similar-sized apartments in the same building vary. A possible answer lies in the floor. It may be that the higher the floor, the greater the sale price of the apartment. He recorded the price (in $1,000s) of 1,200 sq. ft. condominiums in several buildings in the same location that have sold recently and the floor number of the condominium.
a. Determine the regression line.
b. What do the coefficients tell you about the relationship between the two variables?
Refer to Exercise 16.6.
a. What is the standard error of estimate? Interpret its value.
b. Describe how well the memory test scores and length of television commercial are linearly related.
c. Are the memory test scores and length of commercial linearly related? Test using a 5% significance level.
d. Estimate the slope coefficient with 90% confidence.
Pick any 1 (or more) of the 11 exercises above and briefly describe why the prediction interval is so wide.
Pat Statsdud, a student ranking near the bottom of the statistics class, decided that a certain amount of studying could actually improve final grades. However, too much studying would not be warranted because Pat's ambition (if that's what one could call it) was to ultimately graduate with the absolute minimum level of work. Pat was registered in a statistics course that had only 3 weeks to go before the final exam and for which the final grade was determined in the following way:
Total mark = 20% (Assignment)
+ 30% (Midterm test)
+ 50% (Final exam)
To determine how much work to do in the remaining 3 weeks, Pat needed to be able to predict the final exam mark on the basis of the assignment mark (worth 20 points) and the midterm mark (worth 30 points). Pat's marks on these were 12/20 and 14/30, respectively. Accordingly, Pat undertook the following analysis. The final exam mark, assignment mark, and midterm test mark for 30 students who took the statistics course last year were collected.
When one company buys another company, it is not unusual that some workers are terminated. The severance benefits offered to the laid-off workers are often the subject of dispute. Suppose that the Laurier Company recently bought the Western Company and subsequently terminated 20 of Western's employees. As part of the buyout agreement, it was promised that the severance packages offered to the former Western employees would be equivalent to those offered to Laurier employees who had been terminated in the past year. Thirty-six-year-old Bill Smith, a Western employee for the past 10 years, earning $32,000 per year, was one of those let go. His severance package included an offer of 5 weeks' severance pay. Bill complained that this offer was less than that offered to Laurier's employees when they were laid off, in contravention of the buyout agreement. A statistician was called in to settle the dispute. The statistician was told that severance is determined by three factors: age, length of service with the company, and pay. To determine how generous the severance package had been, a random sample of 50 Laurier ex-employees was taken. For each, the following variables were recorded:
Number of weeks of severance pay
Age of employee
Number of years with the company
Annual pay (in thousands of dollars)
a. Determine the regression equation.
b. Comment on how well the model fits the data.
c. Do all the independent variables belong in the equation? Explain.
d. Perform an analysis to determine whether Bill is correct in his assessment of the severance package.
The solution provides step by step method for the calculation of regression analysis. Formula for the calculation and Interpretations of the results are also included.
Multiple Regression Models and Simple Linear Regression Models (21 Problems) : Least Squares, Durbin-Watson, Correlation Coefficient, Standard Error and p-Values
1. The y-intercept (b0) represents the
a. predicted value of Y when X = 0.
b. change in estimated average Y per unit change in X.
c. predicted value of Y.
d. variation around the sample regression line.
2. The least squares method minimizes which of the following?
d. All of the above
A candy bar manufacturer is interested in trying to estimate how sales are
influenced by the price of their product. To do this, the company randomly
chooses 6 small cities and offers the candy bar at different prices. Using
candy bar sales as the dependent variable, the company will conduct a
simple linear regression on the data below:
City Price ($) Sales
River Falls 1.30 100
Hudson 1.60 90
Ellsworth 1.80 90
Prescott 2.00 40
Rock Elm 2.40 38
Stillwater 2.90 32
Referring to Table 1, what is the estimated slope parameter for the
candy bar price and sales data?
4. Referring to Table 1, what is the percentage of the total variation in
candy bar sales explained by the regression model?
5. Referring to Table 1, what is the standard error of the estimate, SYX,
for the data?
6. Referring to Table 1, if the price of the candy bar is set at $2, the
predicted sales will be
7. If the Durbin-Watson statistic has a value close to 0, which
assumption is violated?
a. Normality of the errors.
b. Independence of errors.
d. None of the above.
8. If the Durbin-Watson statistic has a value close to 4, which
assumption is violated?
a. Normality of the errors.
b. Independence of errors.
d. None of the above.
9. If the correlation coefficient (r) = 1.00, then
a. the y-intercept (b0) must equal 0.
b. the explained variation equals the unexplained variation.
c. there is no unexplained variation.
d. there is no explained variation.
10. In a simple linear regression problem, r and b1
a. may have opposite signs.
b. must have the same sign.
c. must have opposite signs.
d. are equal.
11. The strength of the linear relationship between two numerical
variables may be measured by the
a. scatter diagram.
d. coefficient of correlation.
12. The width of the prediction interval estimate for the predicted value
of Y is dependent on
a. the standard error of the estimate.
b. the value of X for which the prediction is being made.
c. the sample size.
d. All of the above.
The following Excel tables are obtained when "Score received on an
exam (measured in percentage points)" (Y) is regressed on
"percentage attendance" (X) for 22 students in a Statistics for
Business and Economics course.
Multiple R 0.142620229
R Square 0.02034053
Adjusted R Square -0.028642444
Standard Error 20.25979924
Coefficients Standard Error t Stat p-value
Intercept 39.39027309 37.24347659 1.057642216 0.302826622
Attendance 0.340583573 0.52852452 0.644404489 0.526635689
13. Referring to Table 2, which of the following statements is true?
a. -2.86% of the total variability in score received can be
explained by percentage attendance.
b. -2.86% of the total variability in percentage attendance can
be explained by score received.
c. 2% of the total variability in score received can be explained
by percentage attendance.
d. 2% of the total variability in percentage attendance can be
explained by score received.
14. In a multiple regression problem involving two independent
variables, if b1 is computed to be +2.0, it means that
a. the relationship between X1 and Y is significant.
b. the estimated average of Y increases by 2 units for each
increase of 1 unit of X1, holding X2 constant.
c. the estimated average of Y increases by 2 units for each
increase of 1 unit of X1, without regard to X2.
d. the estimated average of Y is 2 when X1 equals zero.
15. In a multiple regression model, which of the following is correct
regarding the value of the adjusted r2?
a. It can be negative.
b. It has to be positive.
c. It has to be larger than the coefficient of multiple
d. It can be larger than 1.
16. A manager of a product sales group believes the number of sales
made by an employee (Y) depends on how many years that employee
has been with the company (X1) and how he/she scored on a business
aptitude test (X2). A random sample of 8 employees provides the
Employee Y X1 X2
1 100 10 7
2 90 3 10
3 80 8 9
4 70 5 4
5 60 5 8
6 50 7 5
7 40 1 4
8 30 1 1
Referring to Table 3, for these data, what is the value for the
regression constant, b0?
17. Referring to Table 3, if an employee who had been with the company
5 years scored a 9 on the aptitude test, what would his estimated
expected sales be?
An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross
domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced below.
Multiple R 0.991
R Square 0.982
Adjusted R Square 0.976
Standard Error 0.299
Df SS MS F Signif F
Regression 2 33.4163 16.7082 186.325 0.0001
Residual 7 0.6277 0.0897
Total 9 34.0440
Coeff StdError t Stat P-value
Intercept -0.0861 0.5674 -0.152 0.8837
GDP 0.7654 0.0574 13.340 0.0001
Price -0.0006 0.0028 -0.219 0.8330
18. Referring to Table 4, when the economist used a simple linear
regression model with consumption as the dependent variable and
GDP as the independent variable, he obtained an r2 value of 0.971.
What additional percentage of the total variation of consumption
has been explained by including aggregate prices in the multiple
19. Referring to Table 4, what is the predicted consumption level for an
economy with GDP equal to $4 billion and an aggregate price index
a. $1.39 billion
b. $2.89 billion
c. $4.75 billion
d. $9.45 billion
20. Referring to Table 4, to test for the significance of the coefficient on
aggregate price index, the value of the relevant t-statistic is
21. Referring to Table 4, to test whether gross domestic product has a
positive impact on consumption, the p-value is