Whenever we perform a statistical test, we calculate a so-called p-value, which ranges between 0.00 and 1.00. In general terms, the value of p tells us the probability of drawing a sample WITH THESE CHARACTERISTICS if the null hypothesis is TRUE.
Here's a simple example.
H1: On average, ten-year-old boys are taller than ten-year-old girls.
H0: On average, ten-year-old boys and girls are the same height.
Suppose we sample five boys and five girls, and find that the average height of the boys is 0.5 inches more than the average height of the girls. The value of p would be close to 1, because it's very likely that we could have randomly drawn a sample with such a difference, even if the overall average height of boys and girls is actually the same.
We do the test again, but this time we take a sample of 500 boys and 500 girls. On average, the boys are 0.5 inches taller. In this case, the value of p would be close to zero, because the chance of finding this difference in such a large sample is remote if there's actually NO difference. Think about this, and make sure you understand it.
Before doing our test, we set a level of significance (LOS), which is a value of p. If the LOS is less than a certain value, we'll reject the null hypothesis -- otherwise, we'll not reject it. The LOS is usually set at 0.05; but is that always necessary?
Suppose I'm tasked to evaluate a new textbook. One group of 50 students uses the new book, another group of 50 stays with the old book. On the final exam, the students who used the new book score, on average, 1 point higher (p = 0.23). Other information: both books are in stock in the university bookstore, the prices are roughly the same, and faculty members have no particular preference for one book over the other. Should I recommend the new book be adopted? Why or why not?
In general terms, the value of p of 0.23 tells us the probability of drawing a sample of 100 students and half get a better grade with the new book is 23% and the following null hypothesis is TRUE.
H1: The new book is stastically better than the old book for students' grades.
H0: The new book is stastically no better than the old book for students' grades.
23% is too high to recommend the book, in a very strict sense. The difference in student's grades could be due to other reasons, like study habits, etc. We could say there is not enough data to reject the old book in favor of the new. A wider sample of more students, or a better test to get a more ...
Detailed example of the rationale to accept, or to reject based on this specific example.