The Problem
Example # 232: Estimate the difference in success rates between two populations using a confidence interval to indicate uncertainty. To estimate this difference, we collect data. The data are a series of “Success” and “Failure” values. For sample 1, the data are
“Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Success”, “Failure”, “Failure”, “Failure”, “Success”, “Success”, “Failure”, “Success”, “Failure”, “Success”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”
For sample 2, the data are
“Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”
With this information, calculate the endpoints of the symmetric 95% confidence interval.
Information given:
To summarize the above, the values of import are:
Summary statistics from the problem
\( x_1 \)
| = |
11 |
\( x_2 \)
| = |
5 |
| |
\( n_1 \)
| = |
37 |
\( n_2 \)
| = |
31 |
| |
\( \hat{p}_1 \)
| = |
0.2973 |
---|
\( \hat{p}_2 \)
| = |
0.1613 |
| |
\( \alpha \)
| = |
0.05 |
Note that there is no value given for the hypothesized difference p0. This is because confidence intervals are based solely on the data, and not on any hypothesized values.
It may be helpful if you calculate these values yourself. Once you have, you can check your answers by hovering your mouse over the grey spaces to see if you calculated them correctly.
Assistance
Hide Solution
$$ \begin{align}
\text{Confidence Limits} &= \hat{p}_1 - \hat{p}_2 \pm Z(\alpha/2) \sqrt{ \frac{ \hat{p}_1 \left(1 - \hat{p}_1\right) }{n_1} + \frac{ \hat{p}_2 \left(1 - \hat{p}_2\right) }{n_2} } \\[3em]
&= 0.2973 - 0.1613 \pm Z(0.05/2) \sqrt{ \frac{ 0.2973 \left(1 - 0.2973\right) }{37} + \frac{ 0.1613 \left(1 - 0.1613 \right) }{31} } \\[1em]
&= 0.136 \pm Z(0.025)\ \sqrt{ \frac{ 0.2973 \left(0.7027\right) }{37} + \frac{ 0.1613 \left(0.8387 \right) }{31} } \\[1em]
&= 0.136 \pm 1.96\ \sqrt{ \frac{ 0.208912 }{37} + \frac{ 0.135276}{31} } \\[1em]
&= 0.136 \pm 1.96\ \sqrt{ 0.005646\ +\ 0.004364 } \\[1em]
&= 0.136 \pm 1.96\ \sqrt{ 0.01001 } \\[1em]
&= 0.136 \pm 1.96\ \left( 0.10005 \right) \\[1em]
&= 0.136 \pm 0.196098 \\[1em]
\end{align}
$$
Thus, we are 95% confident that the difference in success rates between population 1 and population 2 is between -0.0601 and 0.3321.
Note that 0.196098 is the margin of error, which is usually symbolized as E. So, for sample sizes like these, polling companies would (should) report the results as “13.6% plus or minus 19.6 points.” As you may expect, larger sample sizes will produce smaller margins of error.
Hide the R Code
As in the one-sample case, this formulation of the confidence interval is pedagogically simple to understand. That is why it is used in introductory textbooks. It is actually based on the Normal approximation to the Binomial distribution. There are several improvements to the test. For these reasons, R does not have a built-in function to perform these calculations for the difference of proportions. It uses a related chi-square test. The following code echoes the above calculations to provide the test statistic.
Copy and paste the following code into your R script window, then run it from there.
alpha = 0.05
samp1 = c("Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Failure", "Success", "Success", "Failure", "Failure", "Success", "Failure", "Failure", "Failure", "Success", "Success", "Failure", "Success", "Failure", "Success", "Failure", "Success", "Success", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Failure")
samp2 = c("Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Success", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Failure", "Failure", "Failure", "Failure", "Failure", "Failure", "Success", "Failure")
x1 = sum(samp1=="Success")
x2 = sum(samp2=="Success")
n1 = length(samp1)
n2 = length(samp2)
phat1 = x1/n1
phat2 = x2/n2
se2 = phat1*(1-phat1)/n1 + phat2*(1-phat2)/n2
lcl = (phat1 - phat2) - abs(qnorm(alpha/2)) * sqrt(se2)
ucl = (phat1 - phat2) + abs(qnorm(alpha/2)) * sqrt(se2)
lcl; ucl
In the R output, the bounds on the 95% central confidence interval are the numbers output after running the last line.
Hide the Excel Code
This formulation of the confidence interval is pedagogically simple to understand. That is why it is used in introductory textbooks. It is actually based on the Normal approximation to the Binomial distribution. There are several improvements to the test. For such reasons, Excel does not have a built-in function to perform these calculations. The following code echoes the above calculations to provide the endpoints of the confidence interval.
Copy and paste the following code into your Excel window, making sure the value sample1
ends up in A1
after pasting.
How to calculate the test statistic in Excel.
sample1 |
sample2 |
|
alpha: |
0.05 |
Failure |
Failure |
|
|
|
Failure |
Failure |
|
samp 1 |
samp 2 |
Failure |
Failure |
x: |
=COUNTIF(A:A,"Success") |
=COUNTIF(B:B,"Success") |
Failure |
Failure |
n: |
=COUNTIF(A:A,"Success")+COUNTIF(A:A,"Failure") |
=COUNTIF(B:B,"Success")+COUNTIF(B:B,"Failure") |
Failure |
Failure |
p-hat: |
=D4/D5 |
=E4/E5 |
Failure |
Failure |
|
|
|
Failure |
Success |
lcl: |
=(D6-E6)-ABS(NORM.S.INV(E1/2))*SQRT(D6*(1-D6)/D5+E6*(1-E6)/E5) |
|
Failure |
Failure |
ucl: |
=(D6-E6)+ABS(NORM.S.INV(E1/2))*SQRT(D6*(1-D6)/D5+E6*(1-E6)/E5) |
|
Failure |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Success |
|
|
|
Failure |
Success |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Success |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Success |
|
|
|
Failure |
Failure |
|
|
|
Failure |
|
|
|
|
Failure |
|
|
|
|
Failure |
|
|
|
|
Failure |
|
|
|
|
Success |
|
|
|
|
Failure |
|
|
|
|
The limits of the 95% confidence interval are the numbers calculated in cells D8 and D9. Again, when you paste this code into Excel, make sure that you start the pasting in cell A1. To help with that, you may want to also copy this notice. It seems to help.