Explore BrainMass

Levels of Significance, p-Values and Standard Deviations

I have four restaurants with a nearest neighbor analysis defining the restaurant pattern as clustered, random, or dispersed.

Italian: Observed mean distance/expected mean distance = 0.67 Less than 1% likelihood it being clustered is random
Z score = 6.3 standard deviations
Significance Levels: 0.01; 0.05; 0.10; Random; 0.10; 0.05; 0.01
Critical Values: -2.56; -1.96; -1.65; 1.65; 1.96; 2.58

Japanese: Observed Mean distance/expected mean distance = 0.25 Less than 1% it being clustered is random
Z score = -9.4 SD
Sig and critical values are the same

Korean: OMD/EMD = 0.95 Neither clustered nor random; it is right in the center of both
Z score = -0.3 SD
Sig And Critical are the same

French: Less than 1% likelihood that its being dispersed as clustered is random
OMD/EMD = 0.42
A score = -4.9 SD
Sig and Critical are the same.

Solution This solution is FREE courtesy of BrainMass!

SIGNIFICANCE LEVEL (also sometimes called P-VALUE):

Suppose you have some random quantity x with some type of average xa (this average is usually taken to be the MEAN, but it can be another popular measure such as MEDIAN, MODE or something fancier still).
When in a particular instance you measure the value of x to be some xe which is different from xa, you can ask (and calculate, if you know the probability distribution) what is the probability of a random result to deviate from the average by as much or more than what happened in this particular instance, that is by more than |xa-xe|.
Let us denote this probability as P(>|xe-xa|) - this notation is a bit cumbersome but informative.

This probability is called SIGNIFICANCE LEVEL or P-VALUE under the following circumstances.
When we want to test some hypothesis, usually called NULL HYPOTHESIS, we make some esperiments to veryfy it. We choose some numerical quantity xa to quantify some prediction by the null hypothesis, but in the testing we get value xe instead. At this stage we ask ourselves how significant is the fact that we got xe rather than xa. Here comes to our help the probability
P(>|xe-xa|) - the p-value. The p-value is the probability that the discrepancy between xa and xe happend accidentally rather than because our hypothesis is wrong.
The smaller it is the more confident we are that the null hypothesis is wrong.

It is more "natural" to use the words "confidence level" with respect to "1-P(>|xe-xa|)" rather than "P(>|xe-xa|)", and in fact some people use it this way.
However other people, and apparently those who led you to the text you uploaded use these words to describe "P(>|xe-xa|)". It is usually possible to understand from the context which is the definition the authors use.

As regards the words "p-value", so far I have only seen it used to describe just "P(>|xe-xa|)".


In the exposition above the probability distribution plays a central role as it is needed to calculate how significant is a deviation from what is expected.
In many areas of real life the needed probability distribution is not in fact very well known, and so people make some usual assumptions. The most popular assumption is to take the probability distribution to be NORMAL and I shall define it in a moment. A point to remember is that there are some mathematical reasons why the normal distribution is often quite a good model for real life distributions.

The normal probability density for a real quantity x is
dp = exp[-(x-xa)^2/(2s^2) ] dx/s*sqrt(2*pi), (1)
where s is the standard deviation and xa is the mean, that is
<x> = xa (2)
<(x-xa)^2> = s^2 (3)
(you should easily find it in any book(s) you have on probabilities).

The probability for x to be farther away from the average xa than some valye xe is

P(>|xe-xa|) = 2* int_|xe-xa|^infty exp[-x^2/(2s^2) ] dx/[s*sqrt(2*pi)] (4)

This integral can be calculated numerically and it is tabulated in some books.
One of popular notations for the p-value of a normal distribution is

P(>|xe-xa|) = 1 - erf{ x/[s*sqrt(2)] } (5)

where "erf" is short for "error function".
Sometimes the error function is defined a bit differently so that the square root of 2 is missing from equation (5).

As you see there are quite a few of points of confusion in names and definitions, so if you want to do something further with the numbers you should check as much as possible which of the popular definitions are employed in your texts.

From equation (5) you can see that the intuitive notion of confidence level (not the one in your text) is just the error function:

1 - P(>|xe-xa|) = erf{ x/[s*sqrt(2)] } (6)

Now we can see the meaning of what people mean saying things like "2.58 sigma significance" or
"2.58 standard deviations significance" (when people say "sigma" they mean "s" in equation (4) which is the standard deviation of the normal distribution).

They mean that in their measurement they got |xe-xa| = 2.59 s, and the probability of this happening by chance rather than because their null hypothesis is wrong is
1 - erf{ 2.58/sqrt(2) } = 0.0099

Conversely, if they say the p-value or significance of their deviation from the expected is 0.05, we use the INVERSE ERROR FUNCTION "erfinv" to calculate the number of standar deviations:

erfinv(0.95)*sqrt(2) = 1.96

In the attached plot you can see the relation between the p-value (your significance level) and the number of standard deviations. It is only good for an illustration. If you want to calculate yourself you should find a book with a table of these numbers or, better, some software.

Just in case here are some numbers for values mentioned in your text:

significance 0.10 means erfinv(0.90)*sqrt(2) = 1.65 standard deviations

9.4 standard deviations mean significance 1 - erf( 9.4/sqrt(2) ) too small for my software (MatLab) to calculate

0.3 standard deviations mean significance 1 - erf( 0.3/sqrt(2) ) = 0.76 - not significant at all

4.9 standard deviations mean significance 1 - erf( 4.9/sqrt(2) ) = 9.6x10^{-7} = 0.00000096 - quite impressive significance

I hope it helps you to understand your text.