Explore BrainMass
Share

Explore BrainMass

    Multiple Regression Analysis & Time-Series Forecasting

    This content was COPIED from BrainMass.com - View the original, and get the already-completed solution here!

    See attached file for questions 14.5, 14.7, 14.25, 14.41, 14.43, 14.49, 16.7, 16.13, 16.15, 16.47

    Please provide a detailed explanations in the answer to the questions, showing:

    1. all work in excel
    2. explanations in MS word

    14.5
    A consumer organization wants to develop a regression model to predict mileage (as measured by miles per gallon) based on the horsepower of the car's engine and the weight of the car (in pounds). Data were collected from a sample of 50 recent car models, and the results are organized and stored in Auto.
    A. State the multiple regression equation
    B. Interpret the meaning of the slopes, b¹ and b² in this problem
    C. Explain why the regression coefficient, bº, has no practical meaning in the context of this problem
    D. Predict the mile per gallon for cars that have 60 horsepower and weigh 2,000 pounds
    E. Construct a 95% confidence interval estimate for the mean miles per gallon for cars that have 60 horsepower and weigh 2,000 pounds
    F. Construct a 95% prediction interval for the miles per gallon for an individual car that has 60 horsepower and weighs 2,000 pounds

    14.7
    The business problem facing the director of broadcasting operations for a television station was the issue of standby hours (i.e. hours in which unionized graphic artists at the station are paid but are not actually involved in any activity) and what factors were related to standby hours. The study included the following variables:
    Standby hours (Y)—Total number of standby hours in a week
    Total staff present (X¹)—Weekly total of people-days
    Remote hours (X²)—Total number of hours worked by employees at locations away from the central plant
    Data were collected for 26 weeks; these data are organized and stored in Standby.
    A. State the multiple regression equation
    B. Interpret the meaning of the slopes, b¹ and b², in this problem
    C. Explain why the regression coefficient, bº , has no practical meaning in the context of this problem
    D. Predict the standby hours for a week in which the total staff present have 310 people-days and the remote hours are 400
    E. Construct a 95% confidence interval estimate for the mean standby hours for weeks in which the total staff present have 310 people-days and the remote hours are 400.
    F. Construct a 95% prediction interval for the standby hours for a single week in which the total staff present have 310 people-days and the remote hours are 400

    14.25
    Use the following results:
    Variable Coefficient Standard Error t Statistic p Value
    INTERCEPT -0.02686 0.06905 -0.39 0.7034
    FOREIMP 0.79116 0.06295 12.57 0.0000
    MIDSOLE 0.60484 0.07174 8.43 0.0000

    A. Construct a 95% confidence interval estimate of the population slope between durability and forefoot shock-absorbing capability
    B. At the 0.05 level of significance, determine whether each independent variable make a significant contribution to the regression model. On the basis of these results, indicate the independent variables to include in this model.

    14.41
    The marketing manager of a large supermarket chain faced the business problem of determining the effect on the sales of pet food of shelf space and whether the product was placed at the front (=1) or back (=0) of the aisle. Data are collected from a random sample of equal-sized stores. The results are shown in the following table (and organized and stored in Petfood):
    Store Shelf Space (Feet) Location Weekly Sales (Dolllars)
    1 5 Back 160
    2 5 Back 220
    3 5 Back 140
    4 10 Back 190
    5 10 Back 240
    6 10 Front 260
    7 15 Back 230
    8 15 Back 270
    9 15 Front 280
    10 20 Back 260
    11 20 Back 290
    12 20 Front 310

    For (a) through (m), do not include an interaction term.
    A. State the multiple regression equation that predicts sales based on shelf space and location.
    B. Interpret the regression coefficients in (a).
    C. Predict the weekly sales of pet food for a store with 8 feet of shelf space situated at the back of the aisle. Construct a 95% confidence interval estimate and a 95% prediction interval.
    D. Perform a residual analysis on the results and determine whether the regression assumptions are valid.
    E. Is there a significant relationship between sales and the two independent variables (shelf space and aisle position) at the 0.05 level of significance?
    F. At the 0.05 level of significance, determine whether each independent variable makes a contribution to the regression model. Indicate the most appropriate regression model for this set of data.
    G. Construct and interpret 95% confidence interval estimates of the population slope for the relationship between sales and shelf space and between sales and aisle location.
    H. Compare the slope in b) with the slope for the simple linear regression model of problem 13.4 on page 481. Explain the difference in the results.
    I. Compute and interpret the meaning of the coefficient of multiple determination, r².
    J. Compute and interpret the adjusted r²
    K. Compare r² with the r² value computed in Problem 13.16(a) on page 487
    L. Compute the coefficients of partial determination and interpret their meaning
    M. What assumption about the slop of shelf space with sales do you need to make in this problem?
    N. Add an interaction term to the model and, at the 0.05 level of significance, determine whether it makes a significant contribution to the model
    O. On the basis of the results of (f) and (n), which model is most appropriate? Explain

    14.43
    The owner of a moving company typically has his most experienced manager predict the total number of labor hours that will be required to complete an upcoming move. This approach has proved useful in the past, but the owner has the business objective of developing a more accurate method of predicting labor hours. In a preliminary effort to provide a more accurate method, the owner decided to use the number of cubic feet moved and whether there is an elevator in the apartment building as the independent variables and has collected data for 36 moves in which the origin and destination were within the borough of Manhattan in New York City and the travel time was an insignificant portion of the hours worked. The data are organized and stored in Moving. For (a) through (k), do not include an interaction term.
    A. State the multiple regression equation for predicting labor hours, using the number of cubic feet moved and whether there is an elevator
    B. Interpret the regression coefficients in (a)
    C. Predict the labor hours for moving 500 cubic feet in an apartment building that has an elevator and construct a 95% confidence interval estimate and a 95% prediction interval
    D. Perform a residual analysis on the results and determine whether the regression assumptions are valid
    E. Is there significant relationship between labor hours and the two independent variables (cubic feet moved and whether there is an elevator in the apartment building) at the 0.05 level of significance?
    F. At the 0.05 level of significance, determine whether each independent variable makes a contribution to the regression model. Indicate the most appropriate regression model for this set of data.
    G. Construct a 95% confidence interval estimate of the population for the relationship between labor hours and cubic feet moved
    H. Construct a 95% confidence interval estimate for the relationship between labor hours and the presence of an elevator
    I. Compute and interpret the adjusted r²
    J. Compute the coefficients of partial determination and interpret their meaning
    K. What assumption do you need to make about the slope of labor hours with cubic feet moved?
    L. Add an interaction term to the model and, at the 0.05 level of significance, determine whether it makes a significant contribution to the model
    M. On the basis of the results of (f ) and (l), which model is most appropriate? Explain

    14.49
    The director of a training program for a large insurance company has the business objective of determining which training method is best for training underwriters. The three methods to be evaluated are traditional, CD-ROM based, and Web based. The 30 trainees are divided into three randomly assigned groups of 10. Before the start of the training, each trainee is given a proficiency exam that measures mathematics and computer skills. At the end of the training, all students take the same end-of-training exam. The results are organized and stored in Underwriting. Develop a multiple regression model to predict the score on the end-of-training exam, based on the score on the proficiency exam and the method of training used. For (a) through (k), do not include an interaction term.
    A. State the multiple regression equation
    B. Interpret the regression coefficient in (a).
    C. Predict the end-of -training exam score for a student with a proficiency exam score of 100 who had Web-based training
    D. Perform a residual analysis on your results and determine whether the regression assumptions are valid
    E. Is there a significant relationship between the end-of-training exam score and the independent variables (proficiency score and training method) at the 0.05 level of significance
    F. At the 0.05 level of significance, determine whether each independent variable make a contribution to the regression model for this set of data.
    G. Construct and interpret 95% confidence interval estimate of the population slope for the relationship between end-of -training exam score and proficiency exam.
    H. Construct and interpret 95% confidence interval estimate of the population slope for the relationship between end -of- training exam score and type of training method.
    I. Compute and interpret the adjusted r²
    J. Compute the coefficients of partial determination and interpret their meaning
    K. What assumption about the slope of proficiency score with end-of-training exam score do you need to make in this problem?
    L. Add interval terms to the model and, at the 0.05 level of significance, determine whether any interaction terms make a significant contribution to the model
    M. On the basis of the results of (f ) and (l), which model is most appropriate? Explain

    16.7
    The following data (stored in Treasury) represent the three-month Treasury bill rates in the United States from 1991 to 2008:
    Year Rate Year Rate
    1991 5.38 2000 5.82
    1992 3.43 2001 3.40
    1993 3.00 2002 1.61
    1994 4.25 2003 1.01
    1995 5.49 2004 1.37
    1996 5.01 2005 3.15
    1997 5.06 2006 4.73
    1998 4.78 2007 4.36
    1999 4.64 2008 1.37
    A. Plot the data
    B. Fit a three-year moving average to the data and plot the results
    C. Using a smoothing coefficient of W = 0.50, exponentially smooth the series and plot the results
    D. What is your exponentially smoothed forecast for 2009?
    E. Repeat (c) and (d), using a smoothing coefficient of W = 0.25
    F. Compare the results of (d) and (e)

    16.13
    Gross domestic product (GDP) is a major indicator of a nation's overall economic activity. It consist of personal consumption expenditures, gross domestic investment, net experts of goods and services, and government consumption expenditures. The GDP (in billions of current dollars) for the United States from 1980 to 2008 is stored in GDP.
    A. Plot the data
    B. Compute a linear trend forecasting equation and plot the trend line
    C. What are your forecasts for 2009 to 2010?
    D. What conclusions can you reach concerning the trend in GDP?

    16.15
    The data in Strategic represent the amount of oil, in billions of barrels, held in the U.S. strategic oil reserve, from 1981 through 2008.
    A. Plot the data
    B. Compute a linear trend forecasting equation and plot the trend line.
    C. Compute a quadratic trend forecasting equation and plot the results
    D. Compute an exponential trend forecasting equation and plot the results
    E. Which model is the most appropriate?
    F. Using the most appropriate model, forecast the number of barrels, in billions, in 2009. Check how accurate your forecast is by locating the true value for 2009 on the Internet or in your library

    16.47
    The following data (stored in Credit) are monthly credit card charges (in millions of dollars) for a popular credit card issued by a large bank (the mane of which is not disclosed at its request):
    Month 2007 2008 2009
    January 31.9 39.4 45.0
    February 27.0 36.2 39.6
    March 31.3 40.5
    April 31.0 44.6

    May 39.4 46.8
    June 40.7 44.7
    July 42.3 52.2
    August 49.5 54.0
    September 45.0 48.8
    October 50.0 55.8
    November 50.9 58.7
    December 58.5 63.4

    A. Construct the time-series plot
    B. Describe the monthly pattern that is evident in the data
    C. In general, would you say that the overall dollar amounts charged on the bank's credit cards is increasing or decreasing? Explain
    D. Note that December 2008 charges were more than $63 million, but those for February 2009 were less than $40 million. Was February's total close to what you would have expected?
    E. Develop an exponential trend forecasting equation with monthly components.
    F. Interpret the monthly compound growth rate.
    G. Interpret the January multiplier
    H. What is the predicted value for March 2009?
    I. What is the predicted value for April 2009?
    J. How can this type of time-series forecasting benefit the bank?

    Appendix A
    14.4 Auto
    MPG Horsepower Weight
    43.1 48 1985
    19.9 110 3365
    19.2 105 3535
    17.7 165 3445
    18.1 139 3205
    20.3 103 2830
    21.5 115 3245
    16.9 155 4360
    15.5 142 4054
    18.5 150 3940
    27.2 71 3190
    41.5 76 2144
    46.6 65 2110
    23.7 100 2420
    27.2 84 2490
    39.1 58 1755
    28.0 88 2605
    24.0 92 2865
    20.2 139 3570
    20.5 95 3155
    28.0 90 2678
    34.7 63 2215
    36.1 66 1800
    35.7 80 1915
    20.2 85 2965
    23.9 90 3420
    29.9 65 2380
    30.4 67 3250
    36.0 74 1980
    22.6 110 2800
    36.4 67 2950
    27.5 95 2560
    33.7 75 2210
    44.6 67 1850
    32.9 100 2615
    38.0 67 1965
    24.2 120 2930
    38.1 60 1968
    39.4 70 2070
    25.4 116 2900
    31.3 75 2542
    34.1 68 1985
    34.0 88 2395
    31.0 82 2720
    27.4 80 2670
    22.3 88 2890
    28.0 79 2625
    17.6 85 3465
    34.4 65 3465
    20.6 105 3380

    14.7 Standby
    Standby Total Staff Remote Dubner Total Labor
    245 338 414 323 2001
    177 333 598 340 2030
    271 358 656 340 2226
    211 372 631 352 2154
    196 339 528 380 2078
    135 289 409 339 2080
    195 334 382 331 2073
    118 293 399 311 1758
    116 325 343 328 1624
    147 311 338 353 1889
    154 304 353 518 1988
    146 312 289 440 2049
    115 283 388 276 1796
    161 307 402 207 1720
    274 322 151 287 2056
    245 335 228 290 1890
    201 350 271 355 2187
    183 339 440 300 2032
    237 327 475 284 1856
    175 328 347 337 2068
    152 319 449 279 1813
    188 325 336 244 1808
    188 322 267 253 1834
    197 317 235 272 1973
    261 315 164 223 1839
    232 331 270 272 1935

    13.4
    The marketing manager of a large supermarket chain would like to use shelf space to prdict the sales of pet food. A random of 12 equal sized stores is selected with the following results stored in Petfood):

    Store Shelf Space (Feet) Location Weekly Sales (Dolllars)
    1 5 Back 160
    2 5 Back 220
    3 5 Back 140
    4 10 Back 190
    5 10 Back 240
    6 10 Front 260
    7 15 Back 230
    8 15 Back 270
    9 15 Front 280
    10 20 Back 260
    11 20 Back 290
    12 20 Front 310

    a) Construct a scatterplot-for those data b₀ = 145 and b₁ = 7.4
    b) Interpret the meaning of the slope, b₁ in this problem
    c) Predict the weekly sales of petfood for stores with 8 feet of shelf space for petfood.

    13.16
    In problem 13.4 the marketing manager used shelf space for petfood to predict weekly sales (stored in petfood). For those data SSR = 20,535 and SST = 30,025.
    a) Determine the coefficient of determination, r², and interpret its meaning.
    b) Determine the standard error of the estimate
    c) How useful do you think this regression model is for predicting sales?

    © BrainMass Inc. brainmass.com October 10, 2019, 3:18 am ad1c9bdddf
    https://brainmass.com/statistics/descriptive-statistics/multiple-regression-analysis-time-series-forecasting-414477

    Attachments

    Solution Preview

    Please see the attachments.

    Multiple Regression & Time-Series Forecasting
    14.5. A consumer organization wants to develop a regression model to predict mileage (as measured by miles per gallon) based on the horsepower of the car's engine and the weight of the car (in pounds). Data were collected from a sample of 50 recent car models, and the results are organized and stored in the attached spreadsheet Auto.
    Coefficients Standard Error t Stat P-value
    Intercept 58.15708245 2.658248208 21.87797297 2.761E-26
    Horsepower -0.117525467 0.032643428 -3.600279529 0.000763128
    Weight -0.006870645 0.001401173 -4.903494042 1.16491E-05
    a. State the multiple regression equation.
    Mileage = β0+ β1*Horse power + β2*Weight +Error
    The estimated regression equation is
    Mileage =58.157-0.1175*Horse power -0.00687 *Weight
    b. Interpret the meaning of the slopes, b1 and b2, in this problem.
    The regression coefficient of Horse power β1 and weight β2 can be interpreted as
    • For a unit a increase in horse power, the Mileage decrease by 0.1175 units.
    • For a unit a increase in weight, the Mileage decrease by 0.00687 units.
    c. Explain why the regression coefficient, b0, has no practical meaning in the context of this problem.
    The regression coefficient β0 represent the mileage when horse and weight are zero. Clearly weight and mileage can not be zero. Thus the β0 has no practical meaning.
    d. Predict the miles per gallon for cars that have 60 horsepower and weigh 2,000 pounds.
    Mileage =58.157-0.1175*60 -0.00687 *2000=37.367 mpg
    e. Construct a 95% confidence interval estimate for the mean miles per gallon for cars that have 60 horsepower and weigh 2,000 pounds.
    The confidence interval is given by . Where SE( ) is given by the formula SE( ) =
    Details

    Confidence Interval Estimate and Prediction Interval

    Data
    Confidence Level 95%
    1
    Horsepower given value 60
    Weight given value 2000

    X'X 50 4542 137826
    4542 449004 13149387
    137826 13149387 4E+08

    Inverse of X'X 0.405084 -0.00019 -0.00013
    -0.00019 6.11E-05 -1.9E-06
    -0.00013 -1.9E-06 1.13E-07

    X'G times Inverse of X'X 0.12679 -0.00041 -2.5E-05

    [X'G times Inverse of X'X] times XG 0.051745
    t Statistic 2.01174
    Predicted Y (YHat) 37.36427

    For Average Predicted Y (YHat)
    Interval Half Width 1.911297
    Confidence Interval Lower Limit 35.45297
    Confidence Interval Upper Limit 39.27556

    f. Construct a 95% prediction interval for the miles per gallon for an individual car that has 60 horsepower and weighs 2,000 pounds.
    The prediction interval is given by . Where SE( ) is given by the formula SE( ) =
    Details
    Confidence Interval Estimate and Prediction Interval

    Data
    Confidence Level 95%
    1
    Horsepower given value 60
    Weight given value 2000

    X'X 50 4542 137826
    4542 449004 13149387
    137826 13149387 4E+08

    Inverse of X'X 0.405084 -0.00019 -0.00013
    -0.00019 6.11E-05 -1.9E-06
    -0.00013 -1.9E-06 1.13E-07

    X'G times Inverse of X'X 0.12679 -0.00041 -2.5E-05

    [X'G times Inverse of X'X] times XG 0.051745
    t Statistic 2.01174
    Predicted Y (YHat) 37.36427

    For Individual Response Y
    Interval Half Width 8.616883
    Prediction Interval Lower Limit 28.74738
    Prediction Interval Upper Limit 45.98115

    14.7. The business problem facing the director of broadcasting operations for a television station was the issue of standby hours (i.e., hours in which unionized graphic artists at the station are paid but are not actually involved in any activity) and what factors were related to standby hours. The study included the following variables:
    Standby hours (Y), Total number of standby hours in a week
    Total staff present (X1), Weekly total of people -days
    Remote hours (X2), Total number of hours worked by employees at locations away from the central plant
    Data was collected for 26 weeks; this is organized and stored in the attached spreadsheet Standby.
    Coefficients Standard Error t Stat P-value
    Intercept -330.675 116.4802 -2.83889 0.009299
    Total Staff 1.764865 0.379036 4.656194 0.00011
    Remote -0.13897 0.058798 -2.36347 0.026932

    a. State the multiple regression equation
    Standby hours =-330.675 +1.76485*Total Staff -0.13897*Remote
    b. Interpret the meaning of the slopes b1 and b2, in this problem.
    For a unit increase in total staff present, the Standby hours increase by 1.7648 units.
    For a unit increase in remote hours, the Standby hours decrease by 0.13897 units .
    c. Explain why the regression coefficient, b0, has no practical meaning in the context of this problem.
    The regression coefficient b0 represent the standby hours when Total staff and remote hours are zero. Here the value of stand by hours when Total staff and remote hours are zero is negative .Thus the b0 has no practical meaning.
    d. Predict the standby hours for a week in which the total staff present have 310 people-days and the remote hours are 400.
    Standby hours =-330.675 +1.76485*310 -0.13897*400=160.846
    e. Construct a 95% confidence interval estimate for the mean standby hours for weeks in which the total staff presents have 310 people-days and the remote hours are 400.
    The confidence interval is given by . Where SE( ) is given by the formula SE( ) =
    Confidence Interval Estimate and Prediction Interval

    Data
    Confidence Level 95%
    1
    Total Staff given value 310
    Remote given value 400

    X'X 26 8428 9763
    8428 2742160 3189708
    9763 3189708 4089525

    Inverse of X'X 10.83449 -0.03465 0.001158
    -0.03465 0.000115 -6.8E-06
    0.001158 -6.8E-06 2.76E-06

    X'G times Inverse of X'X 0.557219 -0.00179 0.000163

    [X'G times Inverse of X'X] times XG 0.067798
    t Statistic 2.068658
    Predicted Y (YHat) 160.8465

    For Average Predicted Y (YHat)
    Interval Half Width 19.06093
    Confidence Interval Lower Limit 141.7855
    Confidence Interval Upper Limit 179.9074

    f. Construct a 95% prediction interval for the standby hours for single weeks in which the total staff presents have 310 people-days and the remote hours are 400.
    The prediction interval is given by . Where SE( ) is given by the formula SE( ) =

    Confidence Interval Estimate and Prediction Interval

    Data
    Confidence Level 95%
    1
    Total Staff given value 310
    Remote given value 400

    X'X 26 8428 9763
    8428 2742160 3189708
    9763 3189708 4089525

    Inverse of X'X 10.83449 -0.03465 0.001158
    -0.03465 0.000115 -6.8E-06
    0.001158 -6.8E-06 2.76E-06

    X'G times Inverse of X'X 0.557219 -0.00179 0.000163

    [X'G times Inverse of X'X] times XG 0.067798
    t Statistic 2.068658
    Predicted Y (YHat) 160.8465

    For Individual Response Y
    Interval Half Width 75.64515
    Prediction Interval Lower Limit 85.20131
    Prediction Interval Upper Limit 236.4916

    14.25. Use the following results:
    Standard
    Variable Coefficient Error T Statistic p-value
    Intercept -0.02686 0.06905 -0.39 0.7034
    FOREIMP 0.79116 0.06295 12.57 0.0000
    MIDSOLE 0.60484 0.07174 8.43 0.0000

    a. Construct a 95% confidence interval estimate of the population slope between durability and forefoot shock absorbing capability.
    The confidence interval is given by where is calculated at n-3 =12 d.f
    Variable Coefficient standard Error T LCL UCL
    FOREIMP 0.79116 0.06295 2.179 0.653992 0.928328
    MIDSOLE 0.60484 0.07174 2.179 0.4485185 0.761161

    b. At the 0.05 level of significance, determine whether each independent variable makes a significant contribution to the regression model. On the basis of these results, indicate the independent variables to include in this model.
    The significance of the regression coefficients of explanatory variables can be tested using the t test and P value. Here both explanatory variables are highly significant as the p value is less than 0.05. Thus we can conclude that the independent variable makes a significant contribution to the regression model.

    14.41. The marketing manager of a large supermarket chain faced the business problem of determining the effect on the sales of pet food of shelf space and whether the product was placed at the front (=1) or back (=0) of the aisle. Data are collected from a random sample of 12 equal sized stores. The results are stored in the attached spreadsheet Petfood.
    Store Shelf Space (Feet) Location Weekly Sales (Dolllars)
    1 5 Back 160
    2 5 Back 220
    3 5 Back 140
    4 10 Back 190
    5 10 Back 240
    6 10 Front 260
    7 15 Back 230
    8 15 Back 270
    9 15 Front 280
    10 20 Back 260
    11 20 Back 290
    12 20 Front 310

    For (a) through (m), do not include an interaction term.
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 130 15.6894 8.285847 1.67E-05 94.5081 165.4919
    Shelf Space 7.4 1.100841 6.722131 8.63E-05 4.909724 9.890276
    Aisle Location 45 13.05437 3.447121 0.007308 15.46896 74.53104

    a) State the multiple regression equation that predicts sales based on shelf space and location.
    The estimated regression model is
    Sales = 130 +7.4*Shelf Space +45*Aisle Location
    b) Interpret the regression coefficients in ...

    Solution Summary

    The solution provides step by step method for the calculation of multiple regression model and trend for a time series model. Formula for the calculation and Interpretations of the results are also included.

    $2.19