# Statistics: observational studies, consistency, 4 rules of p

1) Describe why observational studies are good for surveys and polls but not for showing causality

2) Define consistency as it related to standard deviation

3) List the 4 rules of probability-

4) Name the type of event probability can be applied to

5) List the 4 rules to gaming and explain why the house always wins

© BrainMass Inc. brainmass.com October 25, 2018, 2:10 am ad1c9bdddfhttps://brainmass.com/statistics/probability/statistics-observational-studies-consistency-4-rules-of-p-289585

#### Solution Preview

1) Describe why observational studies are good for surveys and polls but not for showing causality.

In statistics, an observational study draws inferences about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with controlled experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group before the start of the treatment.

Observational studies are good for finding associations between independent variables and outcome of interest. But this does not imply causation and other studies are necessary to confirm a cause and effect relationship. Mainly, it is difficult to control the assignment of people into treated and control groups since this is already predetermined in an observational study. This leads to overt biases due to non-randomness. Thus, the differences between groups could be due to these biases or other hidden biases instead of the causal variable.

An observer of an uncontrolled experiment (or process) records potential factors and the data output: the goal is to determine the effects of the factors. Sometimes the recorded factors may not be directly causing the differences in the output. There may be more important factors which were not recorded but are, in fact, causal. Also, recorded or unrecorded factors may be correlated which may yield incorrect conclusions. Finally, as the number of recorded factors increases, the likelihood increases that at least one of the recorded factors will be highly correlated with the data output simply by chance.

Observational studies are good for surveys and polls because it gives good summaries of how we expect the sample from the population to be (For example, what fraction of people vote for democratic party, what percentage of people smoke). They are good at telling us how the population is rather than what is causing a particular outcome of interest.

2) Define consistency as it related to standard deviation.

In statistics, a sequence of estimators for parameter θ is said to be consistent (or asymptotically consistent) if this sequence converges in probability to θ. Intuitively, this means that estimators taken far enough in the sequence are more likely to be in the vicinity of the parameter being estimated, and in the limit they will be arbitrarily close to θ with probability one.

In practice one usually constructs a single estimator as a function of an available sample of size n, and then imagines being able to keep collecting data and expanding the sample ad infinitum. In this way one would obtain a sequence of estimators indexed by n and the notion of consistency will be understood as the sample size "tends to infinity". If this sequence converges in probability to the true value of the parameter being estimated, we call it a consistent estimator; otherwise the estimator is said to be inconsistent.

In terms of standard deviation, as the sample size tends to infinity, the standard deviation of the estimator tends to 0.

That, is if you have very large sample sizes, there is little variance in your estimates from different samples and all the estimates are very close to their true values.

3) List the 4 rules of probability.

RULE 1 - All Probability is between 0 and 1 (or 100%)

There are two possibilities:

a. An impossible event

b. A certain event

a. When P(A) = 0, the event will not happen (an impossible event).

Example

Since Bill Clinton cannot run for a third Presidential term:

P(Clinton is President in 2001) = 0

b. When P(A) = 1, the event will always happen (a certainty). A consequence of this rule is if A is the event something in the sample space will happen, then P(A) = 1.

RULE 2 - The Complement Rule

The complement of event A is the event (A does not occur). All simple events in the sample space must either be part of event A or part of the complement of event A.

In words:

The probabilities of an event and its complement add to 1.

In symbols: P(A) + P(not A) = 1

Example

What is the chance a Caucasian will not have type O blood?

If 37% of Caucasians have type O blood,

then 100% - 37% = 63% do not have type O blood.

Note:

The complement is defined by applying NOT to a group.

The complement of Republicans is NOT-Republicans as opposed to Democrats.

There may be Independents or another political party.

RULE 3 - The Either/Or Rule

The union of events A and B are ...

#### Solution Summary

4 Rules of probability

Consistency of standard deviation

Pros and Cons of Observational studies

An Introduction to Descriptive Statistics and Probability

Please include all steps.

1) In a poll, respondents were asked if they have traveled to Europe. 68 respondents indicated that they have traveled to Europe and 124 respondents said that they have not traveled to Europe. If one of these respondents is randomly selected, what is the probability of getting someone who has traveled to Europe?

2) The data set represents the income levels of the members of a golf club. Find the probability that a randomly selected member earns at least $100,000.

INCOME (in thousands of dollars)

98 102 83 140 201 96 74 109 163 210

81 104 134 158 128 107 87 79 91 121

3) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

4) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

5) Explain the difference between an independent and a dependent variable.

6) Provide an example of experimental probability and explain why it is considered experimental.

7) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

8) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

9) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

10) Explain the difference between an independent and a dependent variable.

11) Provide an example of experimental probability and explain why it is considered experimental.

12) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

Need help with these 7 problems, I need to include all required steps and answer(s) for full credit. All answers need to be reduced to lowest terms where possible.

1) In a poll, respondents were asked if they have traveled to Europe. 68 respondents indicated that they have traveled to Europe and 124 respondents said that they have not traveled to Europe. If one of these respondents is randomly selected, what is the probability of getting someone who has traveled to Europe?

2) The data set represents the income levels of the members of a golf club. Find the probability that a randomly selected member earns at least $100,000.

INCOME (in thousands of dollars)

98 102 83 140 201 96 74 109 163 210

81 104 134 158 128 107 87 79 91 121

3) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

4) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

5) Explain the difference between an independent and a dependent variable.

6) Provide an example of experimental probability and explain why it is considered experimental.

7) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

8) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

9) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

10) Explain the difference between an independent and a dependent variable.

11) Provide an example of experimental probability and explain why it is considered experimental.

12) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

13) Describe the measures of central tendency. Under what condition(s) should each one be used?

14) Last year, 12 employees from a computer company retired. Their ages at retirement are listed below. First, create a stem plot for the data. Next, find the mean retirement age. Round to the nearest year.

55 77 64 77 69 63 62 64 85 64 56 59

15) A retail store manager kept track of the number of car magazines sold each week over a 10-week period. The results are shown below.

27 30 21 62 28 18 23 22 26 28

a. Find the mean, median, and mode of newspapers sold over the 10-week period.

b. Which measure(s) of central tendency best represent the data?

c. Name any outliers.

16) Joe wants to pass his statistics class with at least a 75%. His prior four test scores are 74%, 68%, 84% and 79%. What is the minimum score he needs on the final exam to pass the class with a 75% average?

17) Nancy participated in a summer reading program. The number of books read by the 23 participants are as follows:

10 9 6 2 5 3 9 1 6 3 10 4 7 6 3 5 6 2 6 5 3 7 2

Number of books read Frequency

1-2

3-4

5-6

7-8

9-10

a. Complete the frequency table.

b. Find the mean of the raw data.

c. Find the median of the raw data.

18)The chart below represents the number of inches of snow for a seven-day period.

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

2 5 3 10 0 4 2

a. Find the mean, median, and mode.

b. Which is the best measure of central tendency?

c. Remove Wednesday from the calculations. How does that impact the three measures of central tendency?

d. Describe the effect outliers have on the measures of central tendency.

19) A dealership sold 15 cars last month. The purchase price of the cars, rounded to the nearest thousand, is represented in the table.

Purchase price Number of cars sold

$15,000 3

$20,000 4

$23,000 5

$25,000 2

$45,000 1

a. Find the mean and median of the data.

b. Which measure best represents the data? Use the results to support your answer.

c. What is the outlier and how does it affect the data?

20) What do the letters represent on the box plot?

21)The test scores from a math final exam are as follows:

64 85 93 55 87 90 73 81 86 79

a. Create a box plot using the data.

b. Label the five points on the box plot and include numerical answers from part "a."

22) Using the data and results from Question 21), answer the following questions.

a. What is the median?

b. What is the range?

c. What is the interquartile range?

d. In a short paragraph, describe the data in the box plot.

23) The math grades on the final exam varied greatly. Using the scores below, how many scores were within one standard deviation of the mean? How many scores were within two standard deviations of the mean?

99 34 86 57 73 85 91 93 46 96 88 79 68 85 89

24) The scores for math test #3 were normally distributed. If 15 students had a mean score of 74.8% and a standard deviation of 7.57, how many students scored above an 85%?

25) If you know the standard deviation, how do you find the variance?

26) To get the best deal on a stereo system, Louis called eight appliance stores and asked for the cost of a specific model. The prices he was quoted are listed below:

$216 $135 $281 $189 $218 $193 $299 $235

Find the standard deviation.

27) A company has 70 employees whose salaries are summarized in the frequency distribution below.

Salary Number of Employees

5,001-10,000 8

10,001-15,000 12

15,001-20,000 20

20,001-25,000 17

25,001-30,000 13

a. Find the standard deviation.

b. Find the variance.

6. Calculate the mean and variance of the data. Show and explain your steps. Round to the nearest tenth.

14, 16, 7, 9, 11, 13, 8, 10

28) Create a frequency distribution table for the number of times a number was rolled on a die. (It may be helpful to print or write out all of the numbers so none are excluded.)

3, 5, 1, 6, 1, 2, 2, 6, 3, 4, 5, 1, 1, 3, 4, 2, 1, 6, 5, 3, 4, 2, 1, 3, 2, 4, 6, 5, 3, 1

29) Answer the following questions using the frequency distribution table you created in No. 28.

a. Which number(s) had the highest frequency?

b. How many times did a number of 4 or greater get thrown?

c. How many times was an odd number thrown?

d. How many times did a number greater than or equal to 2 and less than or equal to 5 get thrown?

30) The wait times (in seconds) for fast food service at two burger companies were recorded for quality assurance. Using the data below, find the following for each sample.

a. Range

b. Standard deviation

c. Variance

Lastly, compare the two sets of results.

Company Wait times in seconds

Big Burger Company 105 67 78 120 175 115 120 59

The Cheesy Burger 133 124 200 79 101 147 118 125

31) What does it mean if a graph is normally distributed? What percent of values fall within 1, 2, and 3, standard deviations from the mean?

32) Define the following terms in your own words.

• Population

• Sample

• Bias

• Design

• Response bias

33) Define and provide an example for each design method.

• Simple random sampling

• Systematic sampling

• Stratified sampling

• Cluster sampling

34) Choose one design method from the list above. Using your example, make a list of 2-3 advantages and 2-3 disadvantages for using the method.

35) The name of each student in a class is written on a separate card. The cards are placed in a bag. Three names are picked from the bag. Identify which type of sampling is used and why.

36)A phone company obtains an alphabetical list of names of homeowners in a city. They select every 25th person from the list until a sample of 100 is obtained. They then call these 100 people to advertise their services. Does this sampling plan result in a random sample? What about a simple random sample? Explain why or why not.

37) The manager of a company wants to investigate job satisfaction among its employees. One morning after a meeting, she talks to all 25 employees who attended. Does this sampling plan result in a random sample? What type of sample is it? Explain.

38) An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects 10 schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? What type of sample is it? Explain.

39) Fifty-one sophomore, 42 junior, and 55 senior students are selected from classes with 516, 428, and 551 students respectively. Identify which type of sampling is used and explain your reasoning.

40) You want to investigate the workplace attitudes concerning new policies that were put into effect. You have funding and support to contact at most 100 people. Choose a design method and discuss the following:

a. Describe the sample design method you will use and why.

b. Specify the population and sample group. Will you include everyone who works for the company, certain departments, full or part-time employees, etc.?

c. Discuss the bias, on the part of both the researcher and participants.

A local newspaper wanted to gather information about house sales in the area. It distributed 25,000 electronic surveys to its readers asking questions about house sales in the past 6 months. Of the surveys sent out, 3.2% were returned. The results found that 92% of people did not sell their house in the past 6 months and 85% of people would expect a loss if they sold their house. The writer wants to use these results to conclude that the housing market is declining, and we are headed for a recession.

. Explain the bias and sampling error in this study.

a. Should the writer conclude that the housing market is declining based upon this data?

b. Why or why not?

41) Explain Type I and Type II errors. Use an example if needed.

42) Explain a one-tailed and two-tailed test. Use an example if needed.

43) Define the following terms in your own words.

• Null hypothesis

• P-value

• Critical value

• Statistically significant

44) A homeowner is getting carpet installed. The installer is charging her for 250 square feet. She thinks this is more than the actual space being carpeted. She asks a second installer to measure the space to confirm her doubt. Write the null hypothesis Ho and the alternative hypothesis Ha.

45) Drug A is the usual treatment for depression in graduate students. Pfizer has a new drug, Drug B, that it thinks may be more effective. You have been hired to design the test program. As part of your project briefing, you decide to explain the logic of statistical testing to the people who are going to be working for you.

• Write the research hypothesis and the null hypothesis.

• Then construct a table like the one below, displaying the outcomes that would constitute Type I and Type II error.

• Write a paragraph explaining which error would be more severe, and why.

46)Cough-a-Lot children's cough syrup is supposed to contain 6 ounces of medicine per bottle. However since the filling machine is not always precise, there can be variation from bottle to bottle. The amounts in the bottles are normally distributed with σ = 0.3 ounces. A quality assurance inspector measures 10 bottles and finds the following (in ounces):

5.95 6.10 5.98 6.01 6.25 5.85 5.91 6.05 5.88 5.91

Are the results enough evidence to conclude that the bottles are not filled adequately at the labeled amount of 6 ounces per bottle?

a. State the hypothesis you will test.

b. Calculate the test statistic.

c. Find the P-value.

d. What is the conclusion?

47) Calculate a Z score when X = 20, μ = 17, and σ = 3.4.

48) Using a standard normal probabilities table, interpret the results for the Z score in Problem 7.

49) Your babysitter claims that she is underpaid given the current market. Her hourly wage is $12 per hour. You do some research and discover that the average wage in your area is $14 per hour with a standard deviation of 1.9. Calculate the Z score and use the table to find the standard normal probability. Based on your findings, should you give her a raise? Explain your reasoning as to why or why not.

50) Tutor O-rama claims that their services will raise student SAT math scores at least 50 points. The average score on the math portion of the SAT is μ = 350 and σ = 35. The 100 students who completed the tutoring program had an average score of 385 points. Is the average score of 385 points significant at the 5% level? Is it significant at the 1% level? Explain why or why not.