# Prediction of Risk for strock from regression model.

A 10 year study by the American Heart Association provided data on how age, blood pressure and smoking relate to the risk of strokes. Data from a portion of this study are shown below. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10 year period. For the smoker variable, 1 indicates that the person is a smoker and 0 indicates a non-smoker.

A) If you could choose only 1 of the variables to help you predict the Risk, which variable would you choose? Why? How good would that model be, in terms of predicting the Risk?
B) Are there any of the independent variables that should not be in the model at the same time when you are trying to predict the Risk? If so, which variables? Why?
C) Develop the "best" regression model possible using this set of variables to help you predict the Risk. State your final model. Interpret the coefficients for the model (i.e. what do the numbers mean?). Finally, tell me how to use this model for predicting a person's risk of a stroke. Illustrate this with numbers and interpret its meaning.

Risk Age Smoker "Blood Pressure"
12 57 0 152
24 67 0 163
13 58 0 155
56 86 1 177
28 59 0 196
51 76 1 189
18 56 1 155
31 78 0 120
37 80 1 135
15 78 1 98
22 71 0 152
36 70 1 173
15 67 1 135
48 77 1 209
15 60 0 199
36 82 1 119
8 66 0 166
34 80 0 125
3 62 0 117
37 59 1 207

