# Basic Business Statistics - Hypothesis testing regression

1. A few years ago, Pepsi invited consumers to take and "Pepsi Challenge." Consumers were asked to decide which of two sodas, Coke or Pepsi, they preferred in a blind taste test. Pepsi was interested in determining what factors played a role in people's taste preferences. One of the factors studied was the gender of the consumer. Below are the results of analyses comparing the taste preferences of men and women with the proportions depicting preference for Pepsi.

Males: n = 109, pSM = 0.422018 Females: n = 52, pSF = 0.25

pSM - pSF = 0.172018 Z = 2.11825

Referring the data above, suppose Pepsi wanted to test to determine if the males preferred Pepsi more than the females. Using the test statistic given, compute the appropriate p value for the test.

a. 0.0170

b. 0.0340

c. 0.2119

d. 0.4681

2. One criterion used to evaluate employees in the assembly section of a large factory is the number of defective pieces per 1,000 parts produced. The quality control department wants to find out whether there is a relationship between years of experience and defect rate. Since the job is repetitious, after the initial training period any improvement due to a learning effect might be offset by a loss of motivation. A defect rate is calculated for each worker in a yearly evaluation. The results for 100 workers are given in the table below.

Years Since Training Period

< 1 Year 1 - 4 Years 5 - 9 Years

Defect Rate: High 6 9 9

Average 9 19 23

Low 7 8 10

Referring to the table above, which test would be used to properly analyze the data in this experiment to determine whether there is a relationship between defect rate and years of experience?

a. ANOVA F test for main treatment effect

b. Z test for difference in two proportions

c. x2 test for equal proportions in a one-way table

d. x2 test for independence in a two-way contingency table

3. Recent studies have found that American children are more obese than in the past. The amount of time children spent watching television has received much of the blame. A survey of 100 ten-year-olds revealed the following data as shown in the table below with regard to weights and average number of hours a day spent watching television. We are interested in testing whether the average number of hours spent watching TV and the child's weight are independent at 1% level of significance.

Weight TV Hours Total

0 - 3 3 - 6 6+

More than 10 lbs. overweight 1 9 20 30

Within 10 lbs. of normal weight 20 15 15 50

More than 10 lbs. underweight 10 5 5 20

Total 31 29 40 100

Referring to the table above, if there is no connection between weight and average number of hours spent watching TV, we should expect how many children to be spending 3 - 6 hours on average watching TV and who are more than 10 lbs. underweight?

a. 5

b. 5.8

c. 6.2

d. 8

4. If the Durbin-Watson statistic has a value close to 0, which assumption is violated?

a. Normality of the errors.

b. Independence of errors.

c. Homoscedasticity.

d. None of the above.

5. If the Durbin-Watson statistic has a value close to 4, which assumption is violated?

a. Normality of the errors.

b. Independence of errors.

c. Homoscedasticity.

d. None of the above.

6. The following Excel tables are obtained when "Score received on an exam (measured in percentage points)" (Y) is regressed on "percentage attendance" (X) for 22 students in a Statistics for Business course.

Regression Statistics

Multiple R 0.142620229

R Square 0.02034053

Adjusted R Square -0.028642444

Standard Error 20.25979924

Observations 22

Coefficients Standard Error t Stat p-value

Intercept 39.39027309 37.24347659 1.057642216 0.302826622

Attendance 0.340583573 0.52852452 0.644404489 0.526635689

Referring to the tables above, which of the following statements is true?

a. -2.86% of the total variability in score received can be explained by percentage attendance.

b. -2.86% of the total variability in percentage attendance can be explained by score received.

c. 2% of the total variability in score received can be explained by percentage attendance.

d. 2% of the total variability in percentage attendance can be explained by score received.

7. A manager of a product sales group believes the number of sales made by an employee (Y) depends on how many years that employee has been with the company (X1) and how he/she scored on a business aptitude test (X2). A random sample of 8 employees provides the following:

Employee Y X1 X2

1 100 10 7

2 90 3 10

3 80 8 9

4 70 5 4

5 60 5 8

6 50 7 5

7 40 1 4

8 30 1 1

Referring to the table above, for these data, what is the value for the regression constant, b0?

a. 0.998

b. 3.103

c. 4.698

d. 21.293

8. An economist is interested to see how consumption for an economy (in $ billions) is influenced by gross domestic product ($ billions) and aggregate price (consumer price index). The Microsoft Excel output of this regression is partially reproduced as follows:

Summary Output

Regression Statistics

Multiple R 0.991

R Square 0.982

Adjusted R Square 0.976

Standard Error 0.299

Observations 10

ANOVA

df SS MS F Signif F

Regression 2 33.4163 16.7082 186.325 0.0001

Residual 7 0.6277 0.0897

Total 9 34.0440

Coeff Std Error t Stat p-value

Intercept -0.0861 0.5674 -0.152 0.8837

GDP 0.7654 0.0574 13.340 0.0001

Price -0.0006 0.0028 -0.219 0.8330

Referring to the table above, when the economist used a simple linear regression model with consumption as the dependent variable and GDP as the independent variable, he obtained an r2 value of 0.971. What additional percentage of the total variation of consumption has been explained by including aggregate prices in the multiple regression?

a. 98.2

b. 11.1

c. 2.8

d. 1.1

What is the predicted consumption level for an economy with GDP equal to $4 billion and an aggregate price index of 150?

a. $1.39 billion

b. $2.89 billion

c. $4.75 billion

d. $9.45 billion

To test for the significance of the coefficient on aggregate price index, the value of the relevant t-statistic is:

a. 2.365

b. 0.143

c. -0.219

d. -1.960

To test whether gross domestic product has a positive impact on consumption, the p-value is:

a. 0.00005

b. 0.0001

c. 0.9999

d. 0.99995

9. A professor of industrial relations believes that an individual's wage rate at a factory (Y) depends on his performance rating (X1) and the number of economics courses the employee successfully completed in college (X2). The professor randomly selects 6 workers and collects the following information:

Employee Y($) X1 X2

1 10 3 0

2 12 1 5

3 15 8 1

4 17 5 8

5 20 7 12

6 25 10 9

Referring to the table above, for these data, what is the estimated coefficient for performance rating, b1?

a. 0.616

b. 1.054

c. 6.932

d. 9.103

10. You decide to predict gasoline prices in different cities and towns in the U.S. for your term project. Your dependent variable is price of gasoline per gallon and your explanatory variables are per capita income, the number of firms that manufacture automobile parts in and around the city, the number of new business starts in the last year, population density of the city, percentage of local taxes on gasoline, and the number of people using public transportation. You collected data of 32 cities and obtained a regression sum of squares SSR = 122.8821 and an SSE of 95.5361. What is the value of the coefficient of multiple determination?

a. 0.5626

b. 0.4576

c. 0.6472

d. 0.2225

11. A real estate builder wishes to determine how house size (House) is influenced by family income (Income), family size (Size), and education of the head of household (School). House size is measured in hundreds of square feet, income is measured in thousands of dollars, and education is measured in years. The builder randomly selected 50 families and ran the multiple regression. The business literature involving human capital shows that education influences an individual's annual income. Combined, these may influence family size. With this in mind, what should the real estate builder be particularly concerned with when analyzing the multiple regression model?

a. Randomness of error terms

b. Collinearity

c. Normality of residuals

d. Missing observations

#### Solution Preview

See the attached file for complete solution. The text here may not be copied exactly as some of the symbols / tables may not print. Thanks

1. A few years ago, Pepsi invited consumers to take and "Pepsi Challenge." Consumers were asked to decide which of two sodas, Coke or Pepsi, they preferred in a blind taste test. Pepsi was interested in determining what factors played a role in people's taste preferences. One of the factors studied was the gender of the consumer. Below are the results of analyses comparing the taste preferences of men and women with the proportions depicting preference for Pepsi.

Males: n = 109, pSM = 0.422018 Females: n = 52, pSF = 0.25

pSM - pSF = 0.172018 Z = 2.11825

Referring the data above, suppose Pepsi wanted to test to determine if the males preferred Pepsi more than the females. Using the test statistic given, compute the appropriate p value for the test.

a. 0.0170

Calculate probability for Z=2.11825 p value = 1-p(Z<2.11825)

2. One criterion used to evaluate employees in the assembly section of a large factory is the number of defective pieces per 1,000 parts produced. The quality control department wants to find out whether there is a relationship between years of experience and defect rate. Since the job is repetitious, after the initial training period any improvement due to a learning effect might be offset by a loss of motivation. A defect rate is calculated for each worker in a yearly evaluation. The results for 100 workers are given in the table below.

Years Since Training Period

< 1 Year 1 - 4 Years 5 - 9 Years

Defect Rate: High 6 9 9

Average 9 19 23

Low 7 8 10

Referring to the table above, which test would be used to properly analyze the data in this experiment to determine whether there is a relationship between defect rate and years of experience?

d. x2 test for independence in a two-way contingency table

3. Recent studies have found that American children are more obese than in the past. The amount of time children spent watching television has received much of the blame. A survey of 100 ten-year-olds revealed the following data as shown in the table below with regard to weights and average number of hours a day spent watching television. We are interested in testing whether the average number of hours spent watching TV and the child's weight are independent at 1% level of significance.

Weight TV ...

#### Solution Summary

Answers 11 questions on basic business statistics. The concepts discussed are regression, chi-square, etc.