Problem Set 1:
Please view the attachment as well. The data set contains the following variables:
Sales: Total sales (in $) by each salesperson last year
Department: What department (home electronics, computers, etc) the salesperson works in.
Year born: The year in which the salesperson was born
Training Hours: Number of hours of training the individual has received during career.
Gender: Salesperson's Gender
Experience: Number of years the individual has worked with your company.
Education: Level of education completed by salesperson
Marital Status: Salesperson's Marital Status
Use the data provided to answer the following questions.
Problem 1.1: Use the data to determine the following:
(a) Which of the variables are categorical and which are numerical?
(b) Which of the categorical variables are nominal and which are ordinal?
(c) Which of the numerical variables are ratio and which are interval?
Problem 1.2: Use the salesperson data to do the following:
(a) Create a summary table and an ordered summary table of the departments the salespeople work in.
(b) Construct a bar chart, a pie chart, and a Pareto diagram.
(c) Which graphical method do you think is best to portray these data?
(d) Based on this data, what conclusions can you make about the departments your salespeople work in?
The easiest way to create the frequency summary table is with the COUNTIF command. The command works like this: =COUNTIF(x,y), where x is the set of cells you want to look in for a particular value, and y is the value you are looking for. For example, =COUNTIF(E:E, "Female") would look in the E column for all instances of the term "Female", count them up, and give you the number.
Excel doesn't have a "canned" option to create a Pareto Diagram, so this is what you need to do: First, construct a column chart with data for both percentage and cumulative percentage. Then, click on one of the columns that represents data from one of the cumulative percentages, choose "change series chart type," and set it to line.
Problem 1.3: Use the data to do the following:
(a) Construct a frequency distribution and a percentage distribution of the years of experience of your sales force.
(b) Construct a histogram and a percentage polygon.
(c) Plot a cumulative percentage polygon.
As a general rule of thumb, the fewer times you type out a formula, the better. If you can accomplish a task by writing one formula and then filling it down with the fill bar, do it that way so you can minimize your chances of making mistakes. One feature that is very useful for accomplishing this task is making use of relative and absolute cell references. Say, for example, in cell C1 you have the expression =A1+B1. If you fill C1 down to C2, C2 will have the equation =A2+B2...when filling down, Excel viewed your cell references as relative, as if you said in C1 to make C1 equal to the sum of the two cells to the left of it. When you fill down to C2, Excel said that C2 should equal the sum of the two cells to the left of it, in this case A2 and B2. In some cases this is exactly what you want Excel to do, but in others you do not want relative cell references. For example, cell D1 may have total nationwide sales for your company, A1:A51 may have the names of the 50 states (plus DC!), and B1:B51 may have total sales within each state. In column C you want to have the percentage of total sales within that state. If in C1 you type =B1/D1, you will get the correct result, but then if you fill D1 down to D51, you will get error messages (or wrong answers) everywhere else. The easiest fix is to use an absolute reference in your equation, which you accomplish with the dollar sign ($). The $ is essentially a way of "locking" in either a row or column in a cell reference. If you type in C1 =B1/D$1, and then fill down, Excel will "lock in" the first row in the reference, so all of your formulae will compute correctly! With knowledge and appropriate application of relative and absolute references, it is possible to create the table in part (a) by typing exactly 4 equations (and filling them down) and nothing else!
Problem 1.4: Use the salesperson data to do the following:
(a) Compute the mean, median, first quartile, and third quartile for the sales variable.
(b) Compute the variance, standard deviation, range, interquartile range, coefficient of variation, skewness, and Z scores for the sales variable.
(c) Are the data skewed? If so, how?
(d) Based on the results of (a) through (c), what conclusions can you reach concerning sales?
(e) Calculate the proportion of sales that are +/- 1, +/- 2, and +/- 3 standard deviations of the mean.
(f) Compare and contrast your findings with what would be expected on the basis of the empirical rule.
Excel has built-in functions to calculate the mean (AVERAGE), median (MEDIAN), quartiles (QUARTILE), variance (VAR for samples, VARP for populations), and standard deviation (STDEV for samples, STDEVP for populations). You might also think to use the MIN and MAX commands in calculating range. Search the Excel helpfile for the appropriate syntax of these commands.
Problem 1.5: Use the sales data to do the following:
(a) Compute the co-variance between sales and hours of training.
(b) Compute the coefficient of correlation between sales and hours of training.
(c) Construct a scatterplot between sales and hours of training with sales on the Y-Axis.
(d) Which of (a)-(c) do you think is the most valuable means to understand the relationship between sales and hours of training?
(e) What conclusions can you reach about the relationship between sales and hours of training? What sorts of business implications does this have?
Excel has built-in functions to calculate the co-variance (COVAR) and correlation coefficient (CORREL). Search the Excel helpfile for the appropriate syntax of these commands.
The solution provides a step by step method for the calculation of descriptive statistics and a preparation of a summary table, an ordered summary table, a bar chart, a pie chart, a Pareto diagram, a frequency distribution, a percentage distribution, a histogram, a percentage polygon, and a cumulative percentage polygon. Formulas for the calculations and Interpretations of the results are also included.