Hide Solution
SSW=g∑i=1ni∑j=1 (xi,j−ˉxi)2=4∑i=1ni∑j=1 (xi,j−ˉxi)2=6∑j=1 (x1,j−15.3333)2+4∑j=1 (x2,j−18.5)2+11∑j=1 (x3,j−15.4545)2+6∑j=1 (x4,j−16)2=(x1,1−15.3333)2 +(x1,2−15.3333)2 +(x1,3−15.3333)2 +(x1,4−15.3333)2 +(x1,5−15.3333)2 +(x1,6−15.3333)2 +(x2,1−18.5)2 +(x2,2−18.5)2 +(x2,3−18.5)2 +(x2,4−18.5)2 +(x3,1−15.4545)2 +(x3,2−15.4545)2 +(x3,3−15.4545)2 +(x3,4−15.4545)2 +(x3,5−15.4545)2 +(x3,6−15.4545)2 +(x3,7−15.4545)2 +(x3,8−15.4545)2 +(x3,9−15.4545)2 +(x3,10−15.4545)2 +(x3,11−15.4545)2 +(x4,1−16)2 + (x4,2−16)2 + (x4,3−16)2 + (x4,4−16)2 + (x4,5−16)2 + (x4,6−16)2=(20−15.3333)2 +(12−15.3333)2 +(18−15.3333)2 +(18−15.3333)2 +(12−15.3333)2 +(12−15.3333)2 +(16−18.5)2 +(17−18.5)2 +(25−18.5)2 +(16−18.5)2 +(26−15.4545)2 +(33−15.4545)2 +(−1−15.4545)2 +(18−15.4545)2 +(22−15.4545)2 +(10−15.4545)2 +(9−15.4545)2 +(8−15.4545)2 +(6−15.4545)2 +(22−15.4545)2 +(17−15.4545)2 +(7−16)2 + (17−16)2 + (23−16)2 + (18−16)2 + (14−16)2 + (17−16)2=(4.6667)2 +(−3.3333)2 +(2.6667)2 +(2.6667)2 +(−3.3333)2 +(−3.3333)2 +(−2.5)2 +(−1.5)2 +(6.5)2 +(−2.5)2 +(10.5455)2 +(17.5455)2 +(−16.4545)2 +(2.5455)2 +(6.5455)2 +(−5.4545)2 +(−6.4545)2 +(−7.4545)2 +(−9.4545)2 +(6.5455)2 +(1.5455)2 +(−9)2 + (1)2 + (7)2 + (2)2 + (−2)2 + (1)2=(21.7778) +(11.1111) +(7.1111) +(7.1111) +(11.1111) +(11.1111) +(6.25) +(2.25) +(42.25) +(6.25) +(111.2066) +(307.843) +(270.7521) +(6.4793) +(42.843) +(29.7521) +(41.6612) +(55.5702) +(89.3884) +(42.843) +(2.3884) +(81) + (1) + (49) + (4) + (4) + (1)=69.3333 +57 +1000.7273 +139=1267.0606
From these calculations, the within sum of squares is SSW = 1267.0606.
Hide the R Code
There are two ways of performing these calculations in R. The method you select will depend on how your data are stored.
Method 1: Wide Format
Copy and paste the following code into your R script window, then run it from there.
## Import data
treatment1 = c(20, 12, 18, 18, 12, 12)
treatment2 = c(16, 17, 25, 16)
treatment3 = c(26, 33, -1, 18, 22, 10, 9, 8, 6, 22, 17)
treatment4 = c(7, 17, 23, 18, 14, 17)
## Change to Long Format
mmt = c( treatment1, treatment2, treatment3, treatment4 )
grp = c( rep("trt1",6), rep("trt2",4), rep("trt3",11), rep("trt4",6) )
## Model the data
mod = aov(mmt~grp)
summary(mod)
In the R output, the value of the sum of squares within is the number in the table under Sum Sq
and to the right of Residuals
. If you would like better precision for that value, or if you would like to have only that value, run the following code in addition to that above:
modSummary = summary(mod)
modSummary[[1]][2,2]
Here, the number outputted is the sum of squares between. How did you get the number? The summary table (also known as an ANOVA table) is just a table. Thus, the first line saves the table as the variable modSummary
the last line looks inside that variable, selects the ANOVA table ([[1]]
), and then selects the row 2, column 2 value.
Method 2: Long Format
Copy and paste the following code into your R script window, then run it from there.
## Import data
yields = c(20, 12, 18, 18, 12, 12, 16, 17, 25, 16, 26, 33, -1, 18, 22, 10, 9, 8, 6, 22, 17, 7, 17, 23, 18, 14, 17)
grp = c('trt1', 'trt1', 'trt1', 'trt1', 'trt1', 'trt1', 'trt2', 'trt2', 'trt2', 'trt2', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4')
## Model the data
mod = aov(yields~grp)
summary(mod)
As discussed above, in the R output, the value of the sum of squares within is the number in the table under Sum Sq
and to the right of Residuals
. If you would like better precision for that value, or if you would like to have only that value, run the following code in addition to that above:
modSummary = summary(mod)
modSummary[[1]][2,2]
Here, the number outputted is the sum of squares between. How did you get the number? The summary table (also known as an ANOVA table) is just a table. Thus, the first line saves the table as the variable modSummary
the last line looks inside that variable, selects the ANOVA table ([[1]]
), and then selects the row 2, column 2 value.
Note: The difference between wide and long formats is this: In wide formatted data, each group has its own variable. In long formatted data, the group number is a variable.