# Kruskal- Wallis test , scatter plot, regression equation

18.6 A business that is interested in starting an on-line shopping service is interested in finding out whether or not there are differences in how women shop on-line. They are interested in capturing people who are already connected to the Internet, so they run a Web. based survey. They ask respondents how many purchases they have made on-line in the last three months. In addition, they ask demographic questions about gender, age, and level of education. The data for women respondents in three age categories are shown below:

21-30 30-45 45-60

2 4 3

2 5 3

3 6 4

4 6 5

4 8 6

5 9 6

8 9 9

(b) Set up the hypotheses to see if there is a difference in the number of purchases made on-line by women for the three age groups.

(c) Find the rank for each data value, and find the rank sum and average rank for each sample.

(d) Perform the Kruskal- Wallis test at the 0.05 level of significance. What can you conclude?

15.25 The British Bankers' Association wanted to look at the relationship between the amount of deposits made (in billions of £) and the number of customers that a bank had. Analysts collected data on six different large banks and found the following information:

Bank Name Deposits (£ billion) Customers (million)

Abbey National 101.7 13.6

Barclays 108.2 10.0

Lloyds 96.9 15.0

National Westminster 113.8 7.5

Woolrich 27.5 4.0

Halifax 77.1 7.6

(a) Which variable is the independent variable? Which is the dependent variable?

(b) Create a scatter plot of the data. Does it appear that the amount of deposits is related to the number of customers?

(c) Find the equation of the regression line for the data-

(d) Plot the regression line on the same plot as the data. Do you think that the line does a good job of predicting the amount of deposits? Why or why not?

(e) Calculate the standard error of the estimate, Sylx' for the regression line.

(f) At the 0.05 level, is the model significant?

© BrainMass Inc. brainmass.com October 24, 2018, 5:16 pm ad1c9bdddfhttps://brainmass.com/statistics/regression-analysis/kruskal-wallis-test-scatter-plot-regression-equation-5816

#### Solution Summary

Answers questions on Kruskal- Wallis test , scatter plot, regression equation. The hypothesis to see if there is a difference in the number of purchases made on-line by women are determined.

Regression and testing of hypothesis problems

Attached Data Files: FUNDS - This file contains 185 rows of data points

1st Column : Fund Name

2nd Column : Fund Type (Balanced = 1, Equity-Income = 2, Growth and Income = 3, Growth = 4, Aggressive-Growth = 5, Small-Company = 6, International and Global =7)

3rd Column : Five Year investment performance

4th Column : 1992 Return (in percentage)

? Calculate descriptive (or summary) statistics for the entire sample. What are the mean five year performance and mean 1992 return?

Standard and Poor's 500 Stock Index (commonly called S & P 500) is the widest general-market index of stocks produced by the U.S. credit-rating agency Standard and Poor. The index is constituted by using the prices of securities of 425 U.S. industrial companies and 75 railway and public-utility corporations. Investors regard the S & P 500 as representing the market, the general level of the price of securities in the U.S. The success of an investment strategy is often measured by the degree to which the strategy beats the S & P 500. The reason for this is that by holding a portfolio that represents the S & P 500 (a Market Index Fund), one can do nearly as well as the S & P 500. The S & P 500 is often used as a base level for comparison. The five-year performance and 1992 return for the S & P 500 are $15,440 and 8%, respectively.

1. Taking these as base level performance figures, what does that say about the investment abilities exhibited by the managed mutual funds in the study?

2. Next, look at the five-year performance first. Construct side-by-side boxplots, forming groups based on the different types of funds. What can you say about risk based on the boxplots? Also, examine descriptive (or summary) statistics separated by type of mutual fund. Are there differences in five-year performance relating to fund type? Do certain types of funds seem to have been more successful than others over this time period? Are the patterns surprising, given that the market did well during this time period?

3. Do you note any unusual funds in terms of performance from examination of the boxplots? Do you see any relationship between the variability of returns for a given fund type and the performance itself? Given the general investing axiom that higher performance goes with higher risk, what sort of relationship would you expect to exist?

4. Now shorten the time horizon from five-year to one-year (1992) performance. Repeat the analyses previously done on the five-year return variable. Are there differences related to fund type? Are any observed differences between fund types similar to ones observed for five-year performance? What does that imply about the connection between short and long-term performance? Another way to investigate this is by constructing a scatter plot of one-year versus five-year performance. Does this plot suggest that funds that do well (or poorly) on one measure necessarily do well (or poorly) on the other?

5. Based on your analysis from above, you identified that the 5 year performance and 1992 Return are different for the various funds.

6. Conduct a "t" test using 5-step hypothesis testing procedure for the different funds (compare say "Balanced with respect to others). What do you conclude?

7. Repeat the analysis using either Kruskal Wallis test or Wilcoxon Rank Sum test for the different funds. Is the conclusion the same as in the "t" test? Why?

8. Construct a scatter plot of the 1992 Return (vertical axis) versus the Five Year Return (horizontal axis). Does it appear that a linear regression model would be appropriate for these data?

9. Now, calculate/plot the least squares regression line for predicting the 1992 return from the five year return. Is there a significant relationship between the two variables? Prove or disprove it with an appropriate test. Do you notice any apparent systematic violations of the regression assumptions?

10. Construct and examine the regression diagnostics for this model.

(a) Do you notice any outlier? If yes, remove the outlier and rerun the regression.

(b) Is the regression model satisfactory?

(c) Do you notice any apparent systematic violations of the regression assumptions?

What will you advise an investor like me to do, given that I have access to these mutual funds?

Please show detailed steps towards solution.

View Full Posting Details