# Analysing Frequency Distributions

Using Excel, prepare a frequency distribution from the data collected (see attached below).

Calculate the Standard Deviation of your data.

Is this a normal distribution?

Every day I leave my house 15 min before I need to be to work, what are the chances I will be late to work on any given day? There are several different ways to conduct my statistical study. I may want to mention the best method to do this analysis would be through an Experimental Study. An Experimental Study is where the investigator/(me) plays an active role in manipulating variables in an experiment. In other words, my method to collect data consists of me writing down how long it takes you to get to work every day, perhaps I could change the route you take, and noting other variables. Another way is i may want to define a list of the data I'll collect and classify them as quantitative vs qualitative. If my data is quantitative my list of variables could consist of: total time of trip, time spend at stop lights, time spend at stop signs, average speed, average time spent below the speed limit (or stuck in traffic), etc. Now that i have a list of variables I can begin to set up the rest of the experiment and get at the question of interest, what are the chances of being late to work. Here is a list of explanations of my variables.

a. Response Variable - is the variable that is monitored as characterizing system performance, in other words, the variable I'm interested in - the total time it takes to get to work.

b. Supervised Variable - a variable which an investigator exercises power over - such as the speed in which you are driving to work.

c. Controlled Variable - is a supervised variable that is held fixed - for every drive I make to work and record, I may wish to keep the speed of my car consistent, therefore that would be the control variable.

d. Experiment Variable - is one that is purposefully manipulated by myself, such as taking different routes to work. In terms of probability there are two types of variables I may be interested in: discrete random variable and continuous random variable

e. Discrete Random Variable (DRV) is a random variable quantity that can be thought of as dependent on chance phenomenon. For example, the total time of my trip is the DRV and it is dependent on the time spent at a stop light.

f. Continuous Random Variable has an entire (continuous) interval of numbers as possibilities. I.e. the time spend at stop lights could have a large range of time spent.

So in summary, all variables that are affected by another variable are said to be DEPENDANT, and variables not affected by other variables are independent. The total time of the trip is dependent on almost all the variable mentioned previously (speed of the car, time at a stop sight, etc). However, the time at a stop sign is an independent of another variable, such as the speed of your car; the speed of my car does not affect the time spent at a stop sign. I would need to specify a probability distribution. A distribution is a "pattern" followed by the values taken on by that variable. For example, there is a 20% chance I arrive in 10 mins, a 75% chance I arrive in 15 min, and a 5% chance I arrive over 15 min. This "distribution" shows it is most likely that I'll arrive on time and least likely that I will be late.

Now there are two concepts to consider, theoretical models (what we expect will happen) and empirical models (what has happened and what we have observed). By using an empirical model we can set up our analysis to evaluate what the likelihood is that I will arrive under 15 min, at 15 min, or above 15 min based on the data we have collected. Let say my sample size is 25 (I've recorded my travel time to work 25 times ) you can begin to set say that 3/25 times I was late for work, 7/25 times I was on time exactly, and 15/25 times I was early. This represents my probability of arriving early, on time, and late. There are several distribution models for probability; this is a very simplified version representing the probability mass function

What are the implications?

1. Your ability to describe a normal distribution as evidenced by a bell shaped curve.

2. Your ability to describe the information provided by the Standard Deviation.

Background Reading:

Introduction to Frequency Distributions, Retrieved November 26, 2012, from http://infinity.cos.edu/faculty/woodbury/Stats/Tutorial/Data_Freq.htm

Khan Academy, Standard Deviation, Retrieved November 26, 2012, from: http://www.khanacademy.org/math/statistics/v/statistics--standard-deviation

Khan Academy, Introduction to the Normal Distribution, Retrieved November 26, 2012, from: http://www.khanacademy.org/math/statistics/v/introduction-to-the-normal-distribution

Slides on frequency distributions, Retrieved November 26, 2012, from http://campus.houghton.edu/orgs/psychology/stat3/

Frequency distributions, Retrieved November 26, 2012, from http://davidmlane.com/hyperstat/normal_distribution.html

Z-Table Calculator, Retrieved November 26, 2012, from http://davidmlane.com/hyperstat/z_table.html

Z-Table and Standard Normal Distribution, Retrieved November 26, 2012, from http://www.oswego.edu/~srp/stats/z.htm

Example of the normal distribution, Retrieved November 26, 2012, from http://www.ms.uky.edu/~mai/java/stat/GaltonMachine.html

#### Solution Summary

The following posting helps analyze frequency distributions.