# Linear Regression

The goal is to test the functional dependence (prediction) of FVC on elderly subjects based on their height.

To test the regression assumption that there is a linear relationship between height and FVC and that the residuals in FVC are normally distributed and have equal variance along the linear relationship with height, we examined a histogram, a P-P plot of Regression Standardized residual, and a scatterplot. All graphs showed a normally distributed linear relationship.

Further, a linear regression analysis was performed to test whether the proportion of variance in FVC is statistically significant. It was determined that the proportion of variance in the FVC was significant (F= 699.4, df = 1, p<.001) and that the correlation of the height variable and FVC had a predictive ability.

Table 3. ANOVA for the Regression Equation, Height (cm) on Forced Vital Capacity (L)

Table 3 shows the computed proportion of variance in FVC. The residual sum of squares of the FVC (328.90) subtracted from its total variability (617.18) gives the amount of FVC predicted variance (288.28). The variance of FVC that is predicted (288.28) divided by the total FVC variability (617.18) gives the coefficient of determination (.467). The F ratio (699.43) provides the test of statistical significance.

Sum of Squares df Mean Square F

Regression 288.28 1 288.28 699.43**

Residual 328.90 798 .41

Total 617.18 799

** p < 0.01

- the regression equation and 95% CI for the Height coefficient with an accompanying narrative

The 95% CI of the actual predicted FVC value is between .057 to .066 and was calculated as:

95% CI for Height coefficient = unstandardized coefficient (Height)  (1.96 x Std. Error)

= .062 + or - (1.96 x .002)

Y= FVC

Constant = -7.193

B1 = .683

E= .385

o an example prediction using the regression equation for height = 160 cm, to include a confidence interval around the individual predicted value using the standard error of the estimate (see Model Summary)

Model Summary(b)

Model R R Square Adjusted R Square Std. Error of the Estimate

1 .683(a) .467 .466 .64199

a Predictors: (Constant), Height (cm)

b Dependent Variable: Forced Vital Capacity (L)

? Discussion of the outliers and their impact on the model, to include

o a comparison table (Outliers Included vs. Outliers Excluded) that includes all Model Summary statistics (R, R2, Adjusted R2, Std Error of the Estimate) and the two different regression equations

Model Summary

Model R Adjusted R Square Std. Error of the Estimate

outlier ~= 1.00 (Selected)

1 .723(a) .523 .522 .59941

a Predictors: (Constant), Height (cm)

Y = b0 + b1X + e

FVC = unstandardized coefficient (Constant) + unstandardized coefficient (Height) x Height

Descriptive Statistics

Mean Std. Deviation

Forced Vital Capacity (L) 2.9618 .87888

Height (cm) 164.8108 9.74889

See attached file for full problem description.

#### Solution Preview

***I moved the work you've already done into the file called second, so that everything is easy to follow. My comments are in red.***

Prepare a short summary report that includes the following elements:

? The goal of the second phase of your study (sentence or two)

The goal of this study is to test an equation that will predict the FVC of elderly subjects based on their height.

? Selection/justification of your basic statistical approach

To test the regression assumption that there is a linear relationship between height and FVC and that the residuals in FVC are normally distributed and have equal variance along the linear relationship with height, we examined a histogram, a P-P plot of Regression Standardized residual, and a scatterplot. All graphs showed a normally distributed linear relationship.

Further, a linear regression analysis was performed to test whether the proportion of variance in FVC is statistically significant. It was determined that the proportion of variance in the FVC was significant (F= 699.4, df = 1, p<.001) and that the correlation of the height variable and FVC had a predictive ability.

? Description of findings (paragraph or two), to include

o a properly formatted ANOVA table for the regression with an ...

#### Solution Summary

The solution is an edited term paper involving a linear regression analysis. This paper includes a justification of the statistical approach, an ANOVA, the regression model, a 95% confidence interval for the coefficient in the model, a prediction based on the model, and a discussion of how any outliers affect the analysis.