# Correlation, Z-score, hypothesis, probability

1. True or false.

a. The decision to use z-scores or Student's t-scores depends first on the size of the sample.

b. When there is correlation between two data sets, there is always an underlying cause.

c. The probability of an event plus the probability of the complementary event always equals 1.

d. The population parameter is always found within the margin of error of the sample statistic.

e. The median of a set of integers will always be either an integer or an integer + 1/2.

f. In modern usage, the null hypothesis is always described mathematically as an equality.

g. You can always find the mode of a set of data, whether categorical or numerical.

h. The confidence level used to determine the margin of error in polling data is typically 99%.

2. The following pairs of numbers (x, y) are the number of dead American soldiers in Iraq and the number of wounded for the months from January 2005 to December 2006.

107 498

58 415

35 371

52 596

80 575

78 511

54 477

85 541

49 545

96 605

84 400

68 412

62 287

55 342

31 498

76 432

69 442

61 458

43 523

65 586

72 790

106 767

70 543

112 690

a) Treat the numbers above as data points (x,y) and find the correlation coefficient r_x,y.

b) Find the highest threshold that r_x,y meets (95% or 99% or none) using table A-5 where n = 25.

c) Find the line of regression equation y-hat = b_1x + b_0; if doing the work by hand instead of using a computer or calculator package, round the middle steps to six places after the decimal; in the final answer, give the numbers rounded to three places after the decimal.

d) Find the data pair that is closest to the line and the data pair that is farthest away, where the measure of distance is |y - y-hat|.

e) Find the data pair that is closest to the line and the data pair that is farthest away, where the measure of distance is |1 - y/y-hat|.

f) Find the five number summary for the first column and the five number summary for second column.

g) Rank each data set from highest (1st) to lowest (24th) and find the rank correlation; use Table A-6 where n = 24 to find if this correlation passes or fails to pass each of the thresholds (alpha = 10%, 5%, 2%, 1%)

h) Find the average and the standard deviation as a sample of set #1. (round to nearest tenth.)

i) Find the average and the standard deviation as a sample of set #2. (round to nearest tenth.)

3. We have a standard deck of 52 cards. You draw two cards from the deck. Find these probablities rounded to four places after the decimal.

(There are four aces in a deck. There are thirteen hearts in a deck.)

p(no aces in the two cards) =

p(exactly one ace in the two cards) =

p(two aces in the two cards) =

p(no hearts in the two cards) =

p(exactly one heart in the two cards) =

p(two hearts in the two cards) =

4. Here is a set of numbers given on a stem and leaf plot.

5|1

4|0223467778

3|14566899

2|36788

1|03459

0|06799

a. Find the average, median, mode and standard deviation of the set as a population.

b. Find the both the frequency and relative frequency of z-scores in this set that meet the following criteria.

b1) z > 2

b2) 1 < z < 2

b3) 0 < z < 1

b4) -1 < z < 0

b5) -2 < z < -1

b6) z < -2

5. In a recent poll of 920 registered voters, 450 respondents said they would vote for Candidate Jones, 436 were going to vote for Candidate Chan and the rest were undecided. Give all answers with three significant digits, either .xxx or xx.x%

a. What are the percentages for each candidate and the undecided vote?

b. Using 450/920 as p, what is the standard deviation for this sample?

c. What is the margin of error for this sample, given a 95% level of confidence?

d. How large a sample size n would we need to get a margin of error of +/-2.2%? (use 450/920 as p.)

6. The following pairs of numbers (x, y) are the closing prices of the 30 stocks that comprise the Dow Jones Industrial Average; the first number (x) is closing price on July 13, 2006 and the second number (y) is the closing price on December 13, 2006.

26.56 35.55

30.99 30.45

76.71 84.58

51.25 59.97

58.36 71.15

79.59 89.60

69.70 61.49

47.87 52.32

43.10 48.84

28.70 34.45

39.55 47.14

64.07 77.36

32.87 35.50

28.32 29.45

31.22 39.67

34.07 39.11

37.99 41.86

17.72 20.70

74.24 94.77

41.39 47.60

60.27 65.47

33.17 43.59

36.94 43.34

22.26 29.55

22.87 25.39

56.54 63.40

71.63 79.25

61.43 64.21

31.74 35.87

44.16 45.90

Treat these numbers as matched pairs and find the differences d = x-y. Find the average of the differences (d-bar), the standard deviation of the differences when the set is considered a sample (s_d-bar) and the 95% confidence interval for mu_d.

Given the interval, are we 95% confident the difference we see is significant?

7. Find the z-scores that most closely correspond to the following percentages, rounding to the nearest two hundredth.

(nearest two hundredth means either to places after the decimal -OR- three places after the decimal where the third digit is 5.

For example: Counting up from zero by two hundreths: .00, .005, .01, .015, .02, .025, ...)

a) z for the 83% cutoff point

b) z for the 23% cutoff point

c) z for the 63% cutoff point

#### Solution Summary

Answers True or false questions and questions dealing with correlation coefficient, probabilities, stem and leaf plot, average, median, mode and standard deviation, z-scores, margin of error, sample size, hypothesis.