1.Open the file pamsue.xls. First, move the column for sales so that it is the rightmost column (it is now to the right of comtype). If the old sales column remains but appears empty, delete that column.
2.Obtain a scatterplot of the sales on the vertical axis against comtype on the horizontal axis. This will give you a good idea of whether different categories of comtype appear to differ in sales. In the scatterplot, you should see that sales in the middle categories 3 - 6 are in similar ranges on the vertical axis, but 1 and 2 have somewhat higher sales, and category 7 appears to have somewhat lower sales. This implies that, when you create dummy variables for comtype, dummy variables for categories 1, 2, 7 are likely to be statistically significant in the multiple regression model (and dummy variables for categories 3 - 6 are likely to be not significant). Although it would be desirable to also obtain the scatterplot of sales against every other X variable, and use the correlation coefficients instead (see step 4 below).
3.Insert seven new columns immediately to the left of comtype, and in these columns, create seven dummy variables to represent the seven categories of site types. Name them comtype1, comtype2, ..., comtype7. At this point, you have 40 columns of data in the spreadsheet with comtype and sales in the last two columns.
4.Use the Correlation facility under Data Analysis to obtain the correlation coefficients between sales and all of the other variables except store and comtype (why exclude comtype?). This will produce a matrix of correlation coefficients between sales and every X variable, as well as between every pair of X variables. To make them easy to read, you may want to format the cells to show numbers with 2 or 3 decimal places.
5.Write down the names of 10 quantitative X variables having the highest correlations with sales. From the correlations worksheet, move to the data worksheet. Select the following columns: sales, plus the 10 quantitative X variables you wrote down, plus comtype1, comptype2, comptype7 (here, you could include up to three more dummy variables, but they are likely to be statistically not significant, so you can save some work - see 2. above). Copy these onto a blank worksheet. Make sure there are no blank columns in within the data range in the new worksheet. Note: To prevent unexpected changes in copying data when formulas are involved, use Paste Special with Values selected when pasting data into a new worksheet.
6.Use Regression under Data Analysis to obtain the regression output table for sales using the variables in the columns you had selected, making sure that Labels and New Worksheet Ply checkboxes are checked, and leave the other boxes unchecked. On the name tab of the output sheet (at the bottom), change the name of the worksheet to Model1.
7.Using appropriate statistics in the regression output table, see if any of the X variables is statistically not significant. If there is at least one insignificant X variable, write down the most insignificant variable, move to the data sheet and delete that column, and re-run Regression without that variable. Repeat until there are no insignificant X variables. Name each output sheet Model2, Model3, and so on for easy identification.
8.When you get to a model in which all remaining X variables are statistically significant, you will have found the final regression equation for predicting sales. Re-run the last model, but this time checking the Residuals checkbox. This will reproduce the last regression table, but below it, you will see columns for Predicted sales and Residuals. Obtain a scatterplot of Residuals against Predicted sales. Also obtain a histogram of Residuals.
9.Use the final regression equation you found in the last step to predict sales at the two sites under consideration.
This assignment seems to involve using the Analysis ToolPak in Excel. It's a great tool for doing statistical analysis in ...
See the attachment for the results.