In 1854, John Snow, a medical doctor hypothesized that there was a relationship between drinking water supply and cholera in London. The two major companies that supplied water to homes in London both got their water from the Thames River which runs through the city. The Lambeth Company pumped from upstream of London. The Southwark and Vauxhall company pumped from downstream. Snow collected data on cholera deaths in houses hooked up to each of the water sources, as well as total cholera deaths in houses in the rest of London. The attached Table summarizes his results.
The first question often asked in analyzing epidemiological data is:
What was the relative risk of dying of cholera for those who got their water from Southwark and Vauxhall (S&V) vs those who got their water from Lambeth?
Relative risk (RR) is the ratio of the risk faced by one exposed group to that of another comparable group believed to be less exposed. In our case, the probability (or risk) of dying of cholera is equal to the number of cholera deaths divided by the number of houses.
So, if you got your water from S&V, your risk was 1263/40,046 = 0.0315 (a little over 3 %)
If you got your water from Lambeth, your risk was 98/26,107 = 0.00375 (less than 0.4 %)
The relative risk of drinking S&V water vs Lambeth water was therefore:
RR = 0.0315/0.00375 = 8.4
In other words, if you were an S&V customer, your cholera risk was more than 8 times that of a Lambeth customer.
As a first approximation, epidemiologists consider a relative risk greater than 2 to be a cause for concern. They do not stop there, however. They also want to test whether the observed differences are "statistically significant" or whether they could have occurred by chance.
If the observed S&V deaths were within the 95 % confidence interval expected from the Lambeth data, then the absolute value of Z would be less than 1.96. We would conclude, with 95% confidence, that the risk of death from cholera was the same in houses served by S&V and by Lambeth. If the difference in observed deaths was statistically significant, then the absolute value of Z would be greater than 1.96.
Now we are ready to crunch the numbers.
The observed probability of cholera deaths among S&V customers was 0.0315. The expected probability, if the difference between S&V and Lambeth customers was not statistically significant, is 0.00375. The expected standard deviation for the S&V data is Square Root [p*(1-p)/n], where p = 0.00375, the expected probability, and n = 40,046, the population served by S&V..
Plug the numbers into equation (2) and you find that Z = 90. At the 95% confidence interval, Z would have to be less than 1.96 for the difference between the two sets of customers to have occurred by chance.
Notice the numbers we put into the denominator. We are looking for the expected standard deviation if the S&V data were not statistically different from the Lambeth data. We therefore use the probability for Lambeth customers and the population of S&V customers.
Is the risk of drinking water provided by others to the rest of London the same as that of drinking water provided by Lambeth at the 99% confidence level? Show your calculations.
This solution provides a null and alternative hypothesis using a significance level of 0.01. It calculates the z-value using the standard error and compares it to the p-value to either accept or reject the null hypothesis. All steps are shown in an Excel file.