How does car mileage vary for various car models?
Variation in gasoline mileage among makes and models of automobiles is influenced substantially be the weight and horsepower of the vehicle. The date you will analyze is provided by the U.S. Environmental Protection Agency. The variables are:
VOL: Cubic feet of cab space
HP: Engine horsepower
MPG: Average miles per gallon
SP: Top speed (mph)
WT: Vehicle weight (100 lb)
date = read.table("car_milage.txt",header=true)
In the analysis below, we will investigate the association of the dependent variables to Average miles per gallon (response variable) using multiple linear regression with a focus on variable selection.
Question 1: Exploratory Data Analysis
- Using a scatterplot describe the relationship between milage and the four independent variables. Describe the general trend )direction and form). Based on this analysis would you suggest that there is a linear relationship between milage and the four independent variables. If not, what transformation for the response variable would you suggest?
Question 2: Fitting the Linear Regression Model
- Fit a linear regression to evaluate the relationship between milage and the four independent variables. If you suggested a transformation in question 1 then use that transformation. Also include a second order term for the weight predictor and transform the horsepower using the logarithmic transformation. Why did I suggest these two model revisions? Write down the equation for the regression line and interpret the estimated value of the parameters in the context of the problem (include its standard error in your interpretation).
Question 3: Variable Selection
Are all predictors statistically significantly associated with the response variable? Using three different criteria, Use Mallow CP and BIC, select a best sub-model. To search through the models try (i) all possible models, (ii) forward stepwise, (iii) backward stepwise. Summarize your findings. Compare the results with variable selection using lasso.
Question 4: Checking the Assumptions of the Model
Plot the relevant residual plots to check the model assumptions for a model selected in the previous question. Enumerate the assumptions and describe what graphical techniques you used. Interpret the displays with respect to the assumptions of the linear regression model. In other words, comment on whether there are any apparent departures from the assumptions of linear regression model. Are there any extreme outliers in the data/residuals?
scatterplotMatrix(~MPG+HP+SP+VOL+WT, reg.line=lm, smooth=FALSE, spread=FALSE, span=0.5, diagonal = 'none', data=Data)
The scatter plot matrix suggests that there is nonlinear relationship between MPG and HP and MPG and SP. Thus we may include square terms to improve the model adequacy.
New variable are created using the transformation
Data$Ln_HP <- with(Data, log(HP))
Data$HP2 <- with(Data, HP^2)
Data$SP2 <- with(Data, SP^2)
Data$WT2 <- with(Data, WT^2)
Regression model with HP , SP , VOL WT as explanatory variables
lm(formula = MPG ~ HP + SP + VOL + WT, data = Data)
Min 1Q Median 3Q Max
-9.0108 -2.7731 0.2733 1.8362 11.9854
Error t value Pr(>|t|)
(Intercept) 192.43775 23.53161 8.178 4.62e-12 ***
HP 0.39221 0.08141 4.818 7.13e-06 ***
SP -1.29482 0.24477 -5.290 1.11e-06 ***
VOL -0.01565 0.02283 -0.685 0.495
WT -1.85980 0.21336 -8.717 4.22e-13 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.653 on 77 degrees of freedom
Multiple R-squared: 0.8733, Adjusted R-squared: 0.8667
F-statistic: 132.7 on 4 and 77 DF, p-value: < 2.2e-16
The estimated regression model is
This model is able to explain 87.33% variability in the MPG. The t test for the significance of the regression coefficient s suggests that all variables except VOL are highly significant. The standard error of estimated for this model is 3.653
RegModel<- lm(MPG~HP+Ln_HP+HP2++SP+SP2+VOL+WT+WT2, data=Data)
MPG ~ HP + Ln_HP + HP2 + +SP + SP2 + VOL + WT + WT2
Df Sum of Sq RSS AIC
- SP 1 8.206 719.41 213.33
- WT 1 14.549 725.75 214.05
- SP2 1 14.872 726.07 214.09
- VOL 1 15.795 726.99 214.19
- WT2 1 16.628 727.83 214.29
<none> 711.20 216.80
- Ln_HP 1 44.914 756.11 217.41
- HP 1 72.260 783.46 220.33
- HP2 1 91.945 803.14 222.36
MPG ~ HP + Ln_HP + HP2 + SP2 + VOL + WT + WT2
Df Sum of Sq RSS AIC
- WT2 1 9.78 729.18 210.03
- WT 1 12.21 731.61 210.31
- VOL 1 15.75 735.16 210.70
- SP2 1 20.58 739.98 211.24
<none> 719.41 213.33
- HP 1 89.46 808.87 218.54
- HP2 1 165.97 885.38 225.95
- Ln_HP 1 318.17 1037.57 ...
The expert examines multiple regression analysis for the average miles per gallon.