# Multiple Regression

Q4data.xls could not uploaded but I copied the data into a Word doc data.doc for you.

Question 4 : The file Q4Data.xlscontains sales data (in thousands of \$) for a medical supplies company that sells its products in three regions South (coded 1), West (coded 2) and Midwest (coded 3). Each region is divided into a number of sales territories with a total of 25 territories. Data are also given for advertising (in hundreds of \$) and bonuses paid to the salespeople (in hundreds of \$). Your task is to prepare for management an analysis showing how the sales vary with advertising and bonus amounts, and across the three sales regions.

a) As a first step, compute summary statistics (mean and five number summary) for sales separately for the three regions. Compare the sales for the three regions based on these statistics.
b) Make side-by-side box plots of sales for the three regions and compare. Are there any outlier sales territories?
c) Run a regression of sales on advertising, bonus and region. Explain why the 1-2-3 coding for region is not correct for fitting the model. What is the correct way to code the regions? Use the correct coding. Choose Midwest as the base region. Is the model a good fit to the data?
d) From the regression output, write separate equations for the three regions. For any fixed amounts spent on advertising and bonuses, which region has the highest sales and by how much compared to the other two regions?
e) Suppose we chose South as the base region instead of Midwest. Without actually doing the regression, tell what will be the new coefficients for Midwest and West. Explain your answer:

The five number summary is as following:
REGION 1 2 3
Maximum 1159.25 1294.25 1634.75
Upper Quartile 1074.875 1208.375 1572.125
Median 1057.25 1154 1550
Lower ...

