# Statistics

1. Use Excel in 1(a)

The following data give the times in seconds between the incoming telephone calls to a medical practice in the time period from opening at 8am to 9am.

2 10 6 8 21 18 11 17

10 9 14 17 6 13 11 19

7 11 4 12 13 9 6 15

9 16 5 10 7 11 14 10

(a) Produce a clearly labelled frequency histogram and associated frequency table for this data on Excel using bin values of 4, 8, 12, etc. Describe the shape of the distribution of times shown in the histogram.

(b) From the histogram (i.e. without calculation, so you will need to explain your reasoning), give an estimate of the mean for this distribution. Provide an interpretation in context for this statistic.

(c) The standard deviation for this set of data is approximately 5 seconds. What does this tell you about the distribution of the data?

(d) In the period from 8am to 9am tomorrow, what proportion of the times between phone calls at this practice would you expect to be less than 16 seconds?

(e) This sample of times would not be representative of all times between phone calls for this medical practice. Why? Indicate how the data may be biased.

2. Use Excel in 2(a) and (d)

A study was conducted on the age at which infants learn to crawl. Twelve babies born at each of two

hospitals were monitored from birth by a child health nurse. The ages at which these infants learned to

crawl are recorded below in weeks.

Hospital A Hospital B

31.43 34.57 27.86 31.29 25 28.57 26.86 25.86

29.43 30.57 29.71 30.86 26.57 30.43 33.71 32.86

35.86 33.29 33.43 32.29 32.14 33.57 32.71 36

(a) Using the Excel boxplot macro from the Online Unit construct informative side-by-side boxplots for these sets of data. Edit the axis scale to start at 20 weeks.

(b) Compare the two sets of data in terms of their distribution characteristics - location, shape and spread.

(c) Using the boxplots and explaining your reasoning , what would you estimate as the proportion of all babies born at Hospital B who

(i) didn't crawl until after the age of 33 weeks?

(ii) first crawled between 24 and 28 weeks?

(d) Calculate the coefficients of variation for the two sets of data using appropriate figures from Excel's Descriptive Statistics. Comment on the meaning of your results.

Use Excel in this question.

3. The file WineSales.csv contains data on monthly Australian sales of sweet white wine, in thousands of litres, from January 1984 to July 1993.

(Source: Australian Bureau of Statistics, quoted by Hyndman, R.J. (n.d.) Time Series Data Library,

http://robjhyndman.com/TSDL. Accessed 28 Jan 2011)

(a) Provide a graphical display of the data that will be suitable for examining the changing trends in Australian sales of sweet white wine over this period.

(b) Outline any significant trends you observe in your graph, noting first the overall impression, and then providing more detailed descriptions.

(c) Describe any seasonal (short-term) variation in your graph and give a possible reason for it.

4. Use Excel in 4(b)

Consider the data SproutTemp.csv obtained on the growth process for beans sprouts, where the yield is related to the temperature. The rate at which the sprouts grow is measured by the time it takes a sprout to reach 4 cm. This time is recorded in the data file in days, together with the associated temperature in degrees Celsius.

(a) When performing a linear regression analysis on this data which of the two variables should be treated as the independent variable? Why?

(b) Produce full regressions statistics output from Excel, together with a well-presented, clearly labelled scatterplot of the data showing the least squares regression line.

(c) From your output, find the sample correlation coefficient between temperature and time taken for the sprouts to reach 4cm, and use it to comment on the type and strength of the linear relationship between the variables.

(d) State clearly the equation of the regression line. Use it to estimate the time taken for sprouts to reach 4cm when kept at a temperature of 15 ºC. (Is your estimate close to the actual value observed in the data corresponding to a temperature of 15 ºC?)

(e) Would this regression line be useful in predicting the growth time at 40 ºC? Why?

(f) "Less than 5% of the variation in growing times in this data set was not explained by the variation in temperature". Comment on the accuracy of this statement with reference to the coefficient of determination for this regression analysis.

5. Government social services offered in a metropolitan area were classified in four divisions - aged care; family services; migrant assistance; and training facilities. Family services employed half of the social service workers in this area, 25% worked in aged care, and 10% with migrants. The remaining staff worked in training.

(a) For a survey, a consultant statistician wants to choose 120 social service staff to interview about workplace stress. Explain the type of sampling method you would recommend and how to implement it in this case.

(b) Last week 8% of the total social services staff in the area were absent, and 32% of these were from aged care. Use the information provided to answer the following:

(i) What is the probability that a social services staff member from this area does not work withmigrants or the aged?

(ii) What is the probability that a social services staff member was absent last week and was from aged care?

(iii) What percentage of the aged care staff were absent last week?

(iv) What is the chance that four randomly selected social service workers, from this metropolitan area, would all work in training facilities? Justify your answer.

#### Solution Summary

This solution is comprised of detailed step-by-step calculations and analysis of the given problems related to Statistics and provides students with a clear perspective of the underlying concepts.