# General Statistics

I need help interpreting what is being asked and how to solve problems.

#### Solution Summary

1. A real estate agent named Betsy has gathered data on 150 houses that were recently sold in Portland, Oregon. Included in this data set are observations for each of the following variables:

 the appraised value of each house (in thousands of dollars),

 the selling price of each house (in thousands of dollars),

 the size of each house (in hundreds of square feet), and

 the number of bedrooms in each house.

Betsy wants to understand the relationship between the selling prices (Y) and the appraised values (X) of homes in the Portland area. Use the following scatterplot and regression output to answer Betsy's questions.

a) Is there evidence of a linear relationship between the selling price and appraised value? If so, characterize the relationship (i.e., indicate whether the relationship is a positive or negative one, a strong or weak one, etc.).

b) Identify any unusual observations by circling them in the scatterplot. What would you recommend Betsy do with these data points?

c) The true or population regression model is

What is the estimated regression model? Briefly explain the difference between the population and estimated models.

d) Interpret each of the following terms means using the output below.

The standard error of estimate se

The coefficient of determination or R2

The slope of the estimated least squares line.

Results of multiple regression for Price

Summary measures

Multiple R 0.8412

R-Square 0.7077

Adj R-Square 0.7057

StErr of Est 7.9406

ANOVA Table

Source df SS MS F p-value

Explained 1 22593.2835 22593.2835 358.3182 0.0000

Unexplained 148 9331.9463 63.0537

Regression coefficients

Coefficient Std Err t-value p-value

Constant 7.7083 6.6543 1.1584 0.2486

Value 0.9482 0.0501 18.9293 0.0000

e) In terms of finding houses that are good deals, would Betsy be more interested in the points above or below the regression line? Explain.

f) Betsy proposes to include the two remaining variables, the size of the home and the number of bedrooms in the home, in the regression analysis. Given the output below, should Betsy have included these two variables?

Results of multiple regression for Price

Summary measures

Multiple R 0.8783

R-Square 0.7714

Adj R-Square 0.7667

StErr of Est 7.0702

ANOVA Table

Source df SS MS F p-value

Explained 3 24626.9734 8208.9911 164.2190 0.0000

Unexplained 146 7298.2563 49.9881

Regression coefficients

Coefficient Std Err t-value p-value

Constant 2.5174 6.5519 0.3842 0.7014

Value 0.6841 0.0621 11.0128 0.0000

Square_Footage 2.4931 0.4627 5.3884 0.0000

Number_Bedrooms -1.2086 1.1094 -1.0894 0.2778

2. Rob needs to buy a PC to replace his aging Mac and has collected data on 24 computers. For each computer he has recorded:

 The speed, measured in megahertz,

 The time in minutes the battery maintains its charge,

 The RAM measured in megabytes,

 The chip type DX, SX, and SL encoded as 3 dummy variables, Chip Type DX, Chip Type SX, and Chip Type SL,

 The monitor type, either Color or Mono, where Color is coded 1 and mono is 0.

a) Why does StatPro display an error message when Rob tries to include the 3 chip type dummy variables in a regression analysis?

b) Interpret the coefficients for Charge, Chip Type DX, and Monitor Type below.

Results of multiple regression for Price

Summary measures

Multiple R 82%

R-Square 67%

Adj R-Square 54%

StErr of Est $ 929

Regression coefficients

Coefficient p-value

Constant $ 2,135 0.0357

Speed $ 63 0.1348

Charge $ 6 0.2727

RAM $ 26 0.4412

Chip Type DX $ 2,839 0.0468

Chip Type SL $ 302 0.6696

Monitor Type__ $ 2,130 0.0005

c) Use the estimated regression equation to predict the price of a laptop computer with the following features: a 50-megahertz processor, a battery that holds its charge for 180 minutes, 20 megabytes of RAM, a DX chip, and a color monitor.

d) Find the 95% prediction interval for the price of the laptop characterized in part c.