The Problem
Example #425: Let us test the null hypothesis that the success rate in population 1 is the same as in population 2. In symbols, this is:
H0 : π1 − π2 = 0
HA : π1 − π2 ≠ 0
To test this hypothesis, we collect data. The data are a series of “Success” and “Failure” values. For sample 1, the data are
“Failure”, “Failure”, “Success”, “Failure”, “Failure”, “Failure”, “Failure”, “Success”, “Success”, “Success”, “Failure”, “Success”, “Failure”, “Success”, “Success”, “Success”, “Failure”, “Success”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Success”, “Success”, “Failure”, “Success”, “Failure”, “Success”, “Success”
For sample 2, the data are
“Success”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”, “Success”, “Failure”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Failure”, “Success”, “Success”, “Failure”, “Failure”, “Failure”, “Success”, “Failure”, “Failure”, “Success”, “Success”, “Success”, “Success”, “Success”, “Failure”, “Success”, “Failure”
With this information, calculate the z test statistic corresponding to the null hypothesis given above.
Information given:
To summarize the above, the values of import are:
Summary statistics from the problem
\( p_0 \)
| = |
0 |
| |
\( x_1 \)
| = |
16 |
\( x_2 \)
| = |
14 |
| |
\( n_1 \)
| = |
30 |
\( n_2 \)
| = |
30 |
| |
\( \hat{p}_1 \)
| = |
0.5333 |
---|
\( \hat{p}_2 \)
| = |
0.4667 |
Calculate these values yourself then hover your mouse over the grey spaces to see if you calculated them correctly.
Assistance
Hide Solution
$$ \begin{align}
z &= \frac{ \left( \hat{p}_1 - \hat{p}_2 \right) - p_0 }{ \sqrt{ \frac{ \hat{p}_1 \left(1 - \hat{p}_1\right) }{n_1} + \frac{ \hat{p}_2 \left(1 - \hat{p}_2\right) }{n_2} }} \\[3em]
&= \frac{ \left( 0.5333 - 0.4667 \right) - 0 }{ \sqrt{ \frac{ 0.5333 \left(1 - 0.5333\right) }{30} + \frac{ 0.4667 \left(1 - 0.4667\right) }{30} }} \\[1em]
&= \frac{ \left( 0.0667 \right) - 0 }{ \sqrt{ \frac{ 0.5333 \left(0.4667\right) }{30} + \frac{ 0.4667 \left(0.5333\right) }{30} }} \\[1em]
&= \frac{ 0.0667 }{ \sqrt{ \frac{ 0.2489 }{30} + \frac{ 0.2489 }{30} }} \\[1em]
&= \frac{ 0.0667 }{ \sqrt{ 0.008296 + 0.008296 }} \\[1em]
&= \frac{ 0.0667 }{ \sqrt{ 0.016593 }} \\[1em]
&= \frac{ 0.0667 }{ 0.128812 } \\[1em]
&= 0.5175 \\[1em]
\end{align}
$$
And so, the test statistic is z = 0.5175. To use this, you would compare it to the ciritical value from the z-table.
For instance, let us specify α = 0.05. This is a two-tailed test, because the alternative hypothesis uses ≠. Because of these two facts, the two critical values are -1.96 and +1.96.
Since the test statistic is between the two critical values, it is not in the rejection region. Thus, we fail to reject the null hypothesis in favor of the alternative.
Hide the R Code
As in the one-sample case, the Wald test is pedagogically simple to understand. That is why it is used in introductory textbooks. It is actually based on the Normal approximation to the Binomial distribution. There are several improvements to the test. For these reasons, R does not have a built-in function to perform the Wald test for proportions. It uses a related chi-square test. The following code echoes the above calculations to provide the test statistic.
Copy and paste the following code into your R script window, then run it from there.
samp1 = c("Failure", "Failure", "Success", "Failure", "Failure", "Failure", "Failure", "Success", "Success", "Success", "Failure", "Success", "Failure", "Success", "Success", "Success", "Failure", "Success", "Failure", "Success", "Success", "Failure", "Failure", "Success", "Success", "Failure", "Success", "Failure", "Success", "Success")
samp2 = c("Success", "Failure", "Failure", "Failure", "Success", "Failure", "Success", "Failure", "Failure", "Success", "Success", "Failure", "Failure", "Failure", "Success", "Success", "Failure", "Failure", "Failure", "Success", "Failure", "Failure", "Success", "Success", "Success", "Success", "Success", "Failure", "Success", "Failure")
x1 = sum(samp1=="Success")
x2 = sum(samp2=="Success")
n1 = length(samp1)
n2 = length(samp2)
phat1 = x1/n1
phat2 = x2/n2
se2 = phat1*(1-phat1)/n1 + phat2*(1-phat2)/n2
ts = (phat1 - phat2 - 0)/sqrt(se2)
ts
In the R output, the test statistic is the number output after running the “ts
” line. Note that R will give you calculations that are more accurate and more precise than doing the calculations by hand. As such, this computer-calculated test statistic will differ slightly from the one you calculate by hand.
Hide the Excel Code
The Wald test is pedagogically simple to understand. That is why it is used in introductory textbooks. It is actually based on the Normal approximation to the Binomial distribution. There are several improvements to the test. For such reasons, Excel does not have a built-in function to perform the Wald test for proportions. The following code echoes the above calculations to provide the Wald test statistic.
Copy and paste the following code into your Excel window, making sure the value sample1
ends up in A1
after pasting.
How to calculate the test statistic in Excel.
sample1 |
sample2 |
|
p0: |
0 |
Failure |
Success |
|
|
|
Failure |
Failure |
|
samp 1 |
samp 2 |
Success |
Failure |
x: |
=COUNTIF(A:A,"Success") |
=COUNTIF(B:B,"Success") |
Failure |
Failure |
n: |
=COUNTIF(A:A,"Success")+COUNTIF(A:A,"Failure") |
=COUNTIF(B:B,"Success")+COUNTIF(B:B,"Failure") |
Failure |
Success |
p-hat: |
=D4/D5 |
=E4/E5 |
Failure |
Failure |
|
|
|
Failure |
Success |
ts: |
=(D6-E6-E1)/SQRT(D6*(1-D6)/D5+E6*(1-E6)/E5) |
|
Success |
Failure |
|
|
|
Success |
Failure |
|
|
|
Success |
Success |
|
|
|
Failure |
Success |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Success |
Success |
|
|
|
Success |
Success |
|
|
|
Failure |
Failure |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Success |
Success |
|
|
|
Success |
Failure |
|
|
|
Failure |
Failure |
|
|
|
Failure |
Success |
|
|
|
Success |
Success |
|
|
|
Success |
Success |
|
|
|
Failure |
Success |
|
|
|
Success |
Success |
|
|
|
Failure |
Failure |
|
|
|
Success |
Success |
|
|
|
Success |
Failure |
|
|
|
The Wald test statistic is the number calculated in cell D8. Again, when you paste this code into Excel, make sure that you start the pasting in cell A1. To help with that, you may want to also copy this notice. It seems to help.