# Probability and Type 1 and 2 Error

Why might a data set suffer from missing data? Explain the techniques researchers may use to handle missing data during data analysis.

What are the four rules that guide the coding and categorization of a data set? Explain why each one is important for researchers.

If a researcher must use a non-probability sample because a list is not available, should convenience sampling or judgment sampling be used? Explain

What is the difference between a probability sample and a non-probability sample? Which one is preferred by researchers? Explain

What is the difference between a Type I error and a Type II error? How are the two errors related?

Define the null and alternative hypotheses. Discuss the relationship between the two hypotheses.

What advantages do stem-and-leaf displays provide over histograms?

What can a researcher determine through the use of cross-tabulations?

Provide some advice for a person writing a short research report

Define the null and alternative hypotheses. Discuss the relationship between the two hypotheses.

What are the assumptions made by the regression model in estimating the parameters and in significance testing?

What can a researcher determine through the use of cross-tabulations?

#### Solution Preview

1) Data might be missing in a data set if a subject does not answer that specific question. For example, they might be asked " What is your income". Some people get offended by this question, and they will not fill out their answer. Then at the end, the data set will be missing values.

The best way to solve for this problem is to reduce the base size in the analysis. So lets say that we asked 200 people the survey. At the end, we ask their income. Only 140 people answer that question. So the base size of the rest of the survey will be 200 but the base size for this question will be 140.

2)When coding a data set you must make sure:

1)the coding label is representative of all the data set. This is important since you want to make sure that by looking at the data code, you get a general idea of the information

2) you can't have too many categorizes of data, otherwise there is no point in actually categorizing them. If you have 20 responses and 10 categories, this does not help the researcher

3)There should be a category for don't know or no answer. This will allow researchers to quickly weed out these answers, and make a smaller base size of answers that have been completed to get a more realistic view of the data

4)never code an answer that you are unsure of. It is better to either clarify the data or put it in the don't know category, otherwise it can skew the data.

3

Here are some definitions:

Convenience sampling - members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are ...

#### Solution Summary

This Solution contains over 1000 words to aid you in understanding the Solution to these questions.