A geyser is a hot spring that occasionally becomes unstable and erupts hot water and steam into the air. The "Old Faithful" geyser at Yellowstone National Park in Wyoming is probably the most famous geyser in the world. Visitors to the park try to arrive at the geyser site to see it erupt without having to wait too long; the name of this geyser comes from the fact that eruptions follow a relatively stable pattern. The National Park Service erects a sign at the geyser site predicting when the next eruption will occur. Thus, it is of interest to understand and predict the interval time (or interruption time) until the next eruption.
The following analysis is based on a sample of 222 interruption times taken during August 1978 and August 1979. The first step in any data analysis is simply to look at the data.
1. Draw a histogram
2. Draw a stem and leaf diagram
3. Draw a boxplot
4. Do a quick analysis of 1 through 4 (explain your findings in detail in your final analysis)
6. Now, draw a scatter (XY) plot of Interruption time and Duration time. What do you notice?
7. Is the data (all 222 data points) normally distributed?
8. Based on your analysis from above, can you identify two distinct groups when you see the scatter plot of "Duration" vs. "Interval". Please explain to me if this is true.
9. Sort the data into 2 groups for "Duration" such that:
First group is called "Short Duration" has duration times <= 3 minutes
Second group is called "Long Duration" has duration times > 3 minutes
10. Now plot a boxplot stacked side by side for the two groups. What can you predict based on the graphical representation?
11. Is the data normally distributed in each group?
12. Compare the mean of the interruption time and duration time for the two groups using a parametric and non-parametric test. Is there a difference in conclusions between the two tests? Why?
13. Calculate the 95% and 99% Confidence Intervals for the interruption time of the two groups.
14. Develop a regression model for the entire dataset AND for the two individual groups (total of 3 models). What can you infer from them?
15. Based on your total analysis, can you help the visitors to identify:
(1) how long they have to wait between eruptions?
(2) Duration of the eruptions
Attached Data File: GEYSER - This file contains 222 pairs of data points of the eruption duration (in minutes) and interruption times (or interval time until the next eruption, in minutes) for the "Old Faithful Geyser".
Please complete 7 thru 15.
Analysis of GEYSER data. The solution contains scatter diagram, boxplot, histogram, stem and leaf plot, Slope, intercept, correlation, residual, r square, coefficient of determination, regression coefficients and confidence interval with interpretations.