# Statistics: What is an outlier? Bias?

Suppose we look at real estate sales for an older neighborhood with similar houses. Suppose the prices range from $200k to $300k. In this case, the mean and median measures will be similar. Now suppose that a house burns down and catches the house next door on fire. Before long, someone builds a large house across the two empty lots. The value of that house might be $750k. Now the mean will be much higher but the median will stay the same. Here is some data for practice:

Before data:

$200, $250, $300

Mean = median = $250

After data:

$200, $250, $300, $750

Mean = $375, Median = $275

Class,

What are the statistical definitions for:

â?¢Outlier

â?¢Bias

â?¢How do these terms apply to this example?

I look forward to your research,

https://brainmass.com/business/business-math/statistics-what-is-an-outlier-bias-382787

#### Solution Preview

An outlier is one that is far off from the rest of the data points. Clearly the $750 home is an outlier in the neighborhood where the middle-of-the-pack home (median) and the average price (mean) were the same. Now the mean is ...

#### Solution Summary

Your response is 139 words in everyday language suitable for a novice. It explains outliers and how bias is not in play in this example.

An Introduction to Descriptive Statistics and Probability

Please include all steps.

1) In a poll, respondents were asked if they have traveled to Europe. 68 respondents indicated that they have traveled to Europe and 124 respondents said that they have not traveled to Europe. If one of these respondents is randomly selected, what is the probability of getting someone who has traveled to Europe?

2) The data set represents the income levels of the members of a golf club. Find the probability that a randomly selected member earns at least $100,000.

INCOME (in thousands of dollars)

98 102 83 140 201 96 74 109 163 210

81 104 134 158 128 107 87 79 91 121

3) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

4) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

5) Explain the difference between an independent and a dependent variable.

6) Provide an example of experimental probability and explain why it is considered experimental.

7) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

8) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

9) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

10) Explain the difference between an independent and a dependent variable.

11) Provide an example of experimental probability and explain why it is considered experimental.

12) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

Need help with these 7 problems, I need to include all required steps and answer(s) for full credit. All answers need to be reduced to lowest terms where possible.

1) In a poll, respondents were asked if they have traveled to Europe. 68 respondents indicated that they have traveled to Europe and 124 respondents said that they have not traveled to Europe. If one of these respondents is randomly selected, what is the probability of getting someone who has traveled to Europe?

2) The data set represents the income levels of the members of a golf club. Find the probability that a randomly selected member earns at least $100,000.

INCOME (in thousands of dollars)

98 102 83 140 201 96 74 109 163 210

81 104 134 158 128 107 87 79 91 121

3) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

4) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

5) Explain the difference between an independent and a dependent variable.

6) Provide an example of experimental probability and explain why it is considered experimental.

7) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

8) Of the 538 people who had an annual check-up at a doctor's office, 215 had high blood pressure. Estimate the probability that the next person who has a check-up will have high blood pressure.

9) Find the probability of correctly answering the first 4 questions on a multiple choice test using random guessing. Each question has 3 possible answers.

10) Explain the difference between an independent and a dependent variable.

11) Provide an example of experimental probability and explain why it is considered experimental.

12) The measure of how likely an event will occur is probability. Match the following probability with one of the statements. There is only one answer per statement.

0 0.25 0.60 1

a. This event is certain and will happen every time.

b. This event will happen more often than not.

c. This event will never happen.

d. This event is likely and will occur occasionally.

13) Describe the measures of central tendency. Under what condition(s) should each one be used?

14) Last year, 12 employees from a computer company retired. Their ages at retirement are listed below. First, create a stem plot for the data. Next, find the mean retirement age. Round to the nearest year.

55 77 64 77 69 63 62 64 85 64 56 59

15) A retail store manager kept track of the number of car magazines sold each week over a 10-week period. The results are shown below.

27 30 21 62 28 18 23 22 26 28

a. Find the mean, median, and mode of newspapers sold over the 10-week period.

b. Which measure(s) of central tendency best represent the data?

c. Name any outliers.

16) Joe wants to pass his statistics class with at least a 75%. His prior four test scores are 74%, 68%, 84% and 79%. What is the minimum score he needs on the final exam to pass the class with a 75% average?

17) Nancy participated in a summer reading program. The number of books read by the 23 participants are as follows:

10 9 6 2 5 3 9 1 6 3 10 4 7 6 3 5 6 2 6 5 3 7 2

Number of books read Frequency

1-2

3-4

5-6

7-8

9-10

a. Complete the frequency table.

b. Find the mean of the raw data.

c. Find the median of the raw data.

18)The chart below represents the number of inches of snow for a seven-day period.

Sunday Monday Tuesday Wednesday Thursday Friday Saturday

2 5 3 10 0 4 2

a. Find the mean, median, and mode.

b. Which is the best measure of central tendency?

c. Remove Wednesday from the calculations. How does that impact the three measures of central tendency?

d. Describe the effect outliers have on the measures of central tendency.

19) A dealership sold 15 cars last month. The purchase price of the cars, rounded to the nearest thousand, is represented in the table.

Purchase price Number of cars sold

$15,000 3

$20,000 4

$23,000 5

$25,000 2

$45,000 1

a. Find the mean and median of the data.

b. Which measure best represents the data? Use the results to support your answer.

c. What is the outlier and how does it affect the data?

20) What do the letters represent on the box plot?

21)The test scores from a math final exam are as follows:

64 85 93 55 87 90 73 81 86 79

a. Create a box plot using the data.

b. Label the five points on the box plot and include numerical answers from part "a."

22) Using the data and results from Question 21), answer the following questions.

a. What is the median?

b. What is the range?

c. What is the interquartile range?

d. In a short paragraph, describe the data in the box plot.

23) The math grades on the final exam varied greatly. Using the scores below, how many scores were within one standard deviation of the mean? How many scores were within two standard deviations of the mean?

99 34 86 57 73 85 91 93 46 96 88 79 68 85 89

24) The scores for math test #3 were normally distributed. If 15 students had a mean score of 74.8% and a standard deviation of 7.57, how many students scored above an 85%?

25) If you know the standard deviation, how do you find the variance?

26) To get the best deal on a stereo system, Louis called eight appliance stores and asked for the cost of a specific model. The prices he was quoted are listed below:

$216 $135 $281 $189 $218 $193 $299 $235

Find the standard deviation.

27) A company has 70 employees whose salaries are summarized in the frequency distribution below.

Salary Number of Employees

5,001-10,000 8

10,001-15,000 12

15,001-20,000 20

20,001-25,000 17

25,001-30,000 13

a. Find the standard deviation.

b. Find the variance.

6. Calculate the mean and variance of the data. Show and explain your steps. Round to the nearest tenth.

14, 16, 7, 9, 11, 13, 8, 10

28) Create a frequency distribution table for the number of times a number was rolled on a die. (It may be helpful to print or write out all of the numbers so none are excluded.)

3, 5, 1, 6, 1, 2, 2, 6, 3, 4, 5, 1, 1, 3, 4, 2, 1, 6, 5, 3, 4, 2, 1, 3, 2, 4, 6, 5, 3, 1

29) Answer the following questions using the frequency distribution table you created in No. 28.

a. Which number(s) had the highest frequency?

b. How many times did a number of 4 or greater get thrown?

c. How many times was an odd number thrown?

d. How many times did a number greater than or equal to 2 and less than or equal to 5 get thrown?

30) The wait times (in seconds) for fast food service at two burger companies were recorded for quality assurance. Using the data below, find the following for each sample.

a. Range

b. Standard deviation

c. Variance

Lastly, compare the two sets of results.

Company Wait times in seconds

Big Burger Company 105 67 78 120 175 115 120 59

The Cheesy Burger 133 124 200 79 101 147 118 125

31) What does it mean if a graph is normally distributed? What percent of values fall within 1, 2, and 3, standard deviations from the mean?

32) Define the following terms in your own words.

• Population

• Sample

• Bias

• Design

• Response bias

33) Define and provide an example for each design method.

• Simple random sampling

• Systematic sampling

• Stratified sampling

• Cluster sampling

34) Choose one design method from the list above. Using your example, make a list of 2-3 advantages and 2-3 disadvantages for using the method.

35) The name of each student in a class is written on a separate card. The cards are placed in a bag. Three names are picked from the bag. Identify which type of sampling is used and why.

36)A phone company obtains an alphabetical list of names of homeowners in a city. They select every 25th person from the list until a sample of 100 is obtained. They then call these 100 people to advertise their services. Does this sampling plan result in a random sample? What about a simple random sample? Explain why or why not.

37) The manager of a company wants to investigate job satisfaction among its employees. One morning after a meeting, she talks to all 25 employees who attended. Does this sampling plan result in a random sample? What type of sample is it? Explain.

38) An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects 10 schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? What type of sample is it? Explain.

39) Fifty-one sophomore, 42 junior, and 55 senior students are selected from classes with 516, 428, and 551 students respectively. Identify which type of sampling is used and explain your reasoning.

40) You want to investigate the workplace attitudes concerning new policies that were put into effect. You have funding and support to contact at most 100 people. Choose a design method and discuss the following:

a. Describe the sample design method you will use and why.

b. Specify the population and sample group. Will you include everyone who works for the company, certain departments, full or part-time employees, etc.?

c. Discuss the bias, on the part of both the researcher and participants.

A local newspaper wanted to gather information about house sales in the area. It distributed 25,000 electronic surveys to its readers asking questions about house sales in the past 6 months. Of the surveys sent out, 3.2% were returned. The results found that 92% of people did not sell their house in the past 6 months and 85% of people would expect a loss if they sold their house. The writer wants to use these results to conclude that the housing market is declining, and we are headed for a recession.

. Explain the bias and sampling error in this study.

a. Should the writer conclude that the housing market is declining based upon this data?

b. Why or why not?

41) Explain Type I and Type II errors. Use an example if needed.

42) Explain a one-tailed and two-tailed test. Use an example if needed.

43) Define the following terms in your own words.

• Null hypothesis

• P-value

• Critical value

• Statistically significant

44) A homeowner is getting carpet installed. The installer is charging her for 250 square feet. She thinks this is more than the actual space being carpeted. She asks a second installer to measure the space to confirm her doubt. Write the null hypothesis Ho and the alternative hypothesis Ha.

45) Drug A is the usual treatment for depression in graduate students. Pfizer has a new drug, Drug B, that it thinks may be more effective. You have been hired to design the test program. As part of your project briefing, you decide to explain the logic of statistical testing to the people who are going to be working for you.

• Write the research hypothesis and the null hypothesis.

• Then construct a table like the one below, displaying the outcomes that would constitute Type I and Type II error.

• Write a paragraph explaining which error would be more severe, and why.

46)Cough-a-Lot children's cough syrup is supposed to contain 6 ounces of medicine per bottle. However since the filling machine is not always precise, there can be variation from bottle to bottle. The amounts in the bottles are normally distributed with σ = 0.3 ounces. A quality assurance inspector measures 10 bottles and finds the following (in ounces):

5.95 6.10 5.98 6.01 6.25 5.85 5.91 6.05 5.88 5.91

Are the results enough evidence to conclude that the bottles are not filled adequately at the labeled amount of 6 ounces per bottle?

a. State the hypothesis you will test.

b. Calculate the test statistic.

c. Find the P-value.

d. What is the conclusion?

47) Calculate a Z score when X = 20, μ = 17, and σ = 3.4.

48) Using a standard normal probabilities table, interpret the results for the Z score in Problem 7.

49) Your babysitter claims that she is underpaid given the current market. Her hourly wage is $12 per hour. You do some research and discover that the average wage in your area is $14 per hour with a standard deviation of 1.9. Calculate the Z score and use the table to find the standard normal probability. Based on your findings, should you give her a raise? Explain your reasoning as to why or why not.

50) Tutor O-rama claims that their services will raise student SAT math scores at least 50 points. The average score on the math portion of the SAT is μ = 350 and σ = 35. The 100 students who completed the tutoring program had an average score of 385 points. Is the average score of 385 points significant at the 5% level? Is it significant at the 1% level? Explain why or why not.