# Standard Deviation of a Sample

Excel Lab 2

For this assignment, choose a single variable data set that interests you from (except the

"Motion of Stars" data set which I'll use for my example):

http://college.hmco.com/mathematics/brase/understandable_statistics/8e/students/datasets

/sv/index.html or on the CD that came with your text. Follow the link, and click on the

Excel data file icon in the pop-up (on the far left). Save this data set in your Filespace

with a name that helps you remember what's in it.

As I mentioned above, I'll use the "Motion of Starts" data set which represents the

angular motions of stars across the sky due to the stars own velocity. A random sample of

stars from the M92 global cluster was used, and the units of the data are arc seconds per

century. (An arc second is 1/3600 of a degree across the sky). Open your data set in

Excel.

On the standard toolbar, click the Paste Function button that looks like this: and

you'll get a dialog box that looks like

Make sure that in the "Or select a category" box, that you have "Statistical" selected.

Now "Cancel" that dialog, we'll be back to it in a minute. Here's my spread sheet. Notice how I've created titled for the different statistics I want

to find and typed them in a titled column:

To find the measures of center and variation I have listed, I used the following functions

from the Insert Function dialog box:

Mean - AVERAGE(number1,number2,...)

Median - MEDIAN(number1,number2,...)

Mode - MODE(number1,number2,...)

StDev s - STDEV(number1,number2,...) - Note that this is the sample standard

deviation and assumes we have only a sample of the entire data population. This is true

for most of the data sets available from our source.

StdDev p - STDEVP(number1,number2,...) - This is the population standard deviation

which assumes the data you have is the entire population.

5% trim mean - TRIMMEAN(data range, percent as a decimal) - This function

returns a mean based on the total percentage of data removed from both the bottom and

the top of the ordered data values. If you want a 5% trimmed mean (and you do!),

implying that 5% of the bottom data and 5% of the top data will be removed before a

mean is calculated, then enter 0.10 for the percent in Excel.

Count data n - COUNT(number1,number2,...) - returns the number of data items

selected.

In my work sheet I have already found the Mean by typing =AVERAGE(A2:A85) in the

D3 cell. I could have also used the Insert Function dialog by first typing an "=" in the D3

cell and then clicking the button which allows me to then enter the range of values I

want to find the mean of. Hit "OK" and it's done. Let's step through my next computation, which is Mode. I click on cell D5 and type "="

then click . The dialog comes up, I select Mode from the list and hit "OK" to get this

new dialog:

I click in the Number 1 box and then go select my data cells A1:A85, then click OK. The

mode of my selected data shows up in the cell

Find all 7 statistics for your data set and write up your results. Explain what your data set

represents (there is an explanation on the download dialogue, you can paraphrase the wording from that). Summarize the results in an 8 row by 2 column table. To do this, go

to Table->Insert->Table and Select 8 rows and 2 columns, then OK.

Here's a good start. Yours should go something like this. Make sure to answer the extra

questions I have for you to answer.

The following is my analysis of a data set which representing the angular motions of stars

across the sky due to the stars own velocity. A random sample of stars from the M92

global cluster was used, and the units of the data are arc seconds per century. (An arc

second is 1/3600 of a degree across the sky).

Measurement Arc Seconds/Century

Mean 0.028904

Median 0.027

Mode 0.016

Sample Std Deviation 0.010681

Population Std Deviation 0.010617

5% Trim Mean 0.028421

Number of data points 84

By sorting my data from highest to lowest, I notice that the minimum value is 0.012 arc

seconds per century and maximum value is 0.06 arc seconds per century. There are no

extreme outliers in my data set which is reflected by the very slight difference between

the mean and the 5% trim mean which removes 5% of the data from each end of the

ordered data.

You should explain why your sample standard deviation is larger than the population

standard deviation. Finally, let's make a quick and dirty histogram of your data by

selecting the data in Excel then go to Tools->Data Analysis->Histogram. Select Chart

output and New Workbook:

Ignore the Bin Range and Excel will choose them for you. Just click OK and you'll get a

frequency table and a histogram that could use a few tweaks, but it's not needed. Make

sure to include your histogram in your write-up.

Here's mine (I selected a bar, then right clicked on it and selected Format data series-

>Options, and then set the Gap width to 0):

Arc Seconds per Century

0

20

40

0.012

0.022

0.033

0.044

0.054

Bin

Frequency

Frequency

It looks like data is distributed in a bell-shape and is almost symmetric (the mass is

slightly shifted to the left) around the mean at 0.027. This is to be expected, since the

mean and the median are so close. However, the mode is very low compared to both of

these measurements, so we should expect perfect symmetry (When mode, mean, and

median are very close, the distribution of the data in a histogram looks very symmetric

and bell-shaped.)

Include an analysis of your histogram in light of the seven statistics you computed before.

Is the standard deviation small compared to the range of the data, or large? Is there

anything surprising about the results? Did your trimmed mean jump significantly from

the full mean? Why? If so, should you toss out any values and do the analysis again (you

don't have to, you could just mention that it should be done and leave it at that.)

#### Solution Summary

This Solution contains calculations to aid you in understanding the Solution to this question.