Gallop Marketing has been gathering data on people's television viewing habits in smaller metropolitan areas. Radhika Nanda, an analyst at Gallop, is trying to predict the number of households that tune in to a given television station at any time during a given calendar week. She has gathered data for 25 different stations/broadcast areas, and has run a simple linear regression model, where the number of households that tune in to a station (in 10,000s) sometime during the week is the dependent variable. The independent variable that she has used is the number of households (in 10,000s) with televisions in the broadcast area. The resulting regression model output appears below. Radhika has looked at the output and is discouraged with the results.
(a) Based on the above regression output, why might this regression not be a good model.
Radhika has decided to give her factors some more thought, and has come upon the idea that the number of households who tune in to a particular station during the week might also depend on whether or not the station's channel is VHF or UHF. For example, most VHF stations are major networks (like ABC, CBS, or NBC), which are viewed more often regardless of the size of the broadcast area. Radhika therefore has included a dummy variable for whether a station broadcasts on VHF (VHF = 1, UHF = 0).
The results of her multiple linear regression are as follows:
(b) Write a complete equation for the multiple linear regression that incorporates the estimated coefficients provided by the second regression output.Define in words all the variables used in the equation. Do the signs of the regression coefficients make sense? Why or why not?
(d) Below are residual plots that Radhika has produced based on the second regression model. Do you see any problems with the model based on looking these plots? Why or why not?
(e) Radhika has computed the sample correlation of the data for "Number of Households" and "UHF/VHF" data. The results of Radhika's correlation computations are shown below.
Number of Households (10,000s) UHF, VHF
Number of Households (10,000s) 1.0000
UHF, VHF 0.0730 1.0000
Do these numbers indicate any possible problems with the regression? Why or why not? In order to check to validity of the second regression model, describe what additional test you would perform, and why.
See attached file for full problem description.
Word document to work out regression-centered problems for Gallop Marketing and evaluate a model based on prepared residual plots.