Hide Solution
SSW=g∑i=1ni∑j=1 (xi,j−ˉxi)2=4∑i=1ni∑j=1 (xi,j−ˉxi)2=6∑j=1 (x1,j−4.5)2+8∑j=1 (x2,j−3)2+9∑j=1 (x3,j−6.6667)2+9∑j=1 (x4,j−2)2=(x1,1−4.5)2 +(x1,2−4.5)2 +(x1,3−4.5)2 +(x1,4−4.5)2 +(x1,5−4.5)2 +(x1,6−4.5)2 +(x2,1−3)2 +(x2,2−3)2 +(x2,3−3)2 +(x2,4−3)2 +(x2,5−3)2 +(x2,6−3)2 +(x2,7−3)2 +(x2,8−3)2 +(x3,1−6.6667)2 +(x3,2−6.6667)2 +(x3,3−6.6667)2 +(x3,4−6.6667)2 +(x3,5−6.6667)2 +(x3,6−6.6667)2 +(x3,7−6.6667)2 +(x3,8−6.6667)2 +(x3,9−6.6667)2 +(x4,1−2)2 + (x4,2−2)2 + (x4,3−2)2 + (x4,4−2)2 + (x4,5−2)2 + (x4,6−2)2 + (x4,7−2)2 + (x4,8−2)2 + (x4,9−2)2=(2−4.5)2 +(0−4.5)2 +(9−4.5)2 +(3−4.5)2 +(2−4.5)2 +(11−4.5)2 +(−6−3)2 +(11−3)2 +(3−3)2 +(6−3)2 +(0−3)2 +(−2−3)2 +(9−3)2 +(3−3)2 +(10−6.6667)2 +(4−6.6667)2 +(8−6.6667)2 +(14−6.6667)2 +(−1−6.6667)2 +(4−6.6667)2 +(8−6.6667)2 +(3−6.6667)2 +(10−6.6667)2 +(−4−2)2 + (4−2)2 + (−1−2)2 + (10−2)2 + (−7−2)2 + (−1−2)2 + (11−2)2 + (−1−2)2 + (7−2)2=(−2.5)2 +(−4.5)2 +(4.5)2 +(−1.5)2 +(−2.5)2 +(6.5)2 +(−9)2 +(8)2 +(0)2 +(3)2 +(−3)2 +(−5)2 +(6)2 +(0)2 +(3.3333)2 +(−2.6667)2 +(1.3333)2 +(7.3333)2 +(−7.6667)2 +(−2.6667)2 +(1.3333)2 +(−3.6667)2 +(3.3333)2 +(−6)2 + (2)2 + (−3)2 + (8)2 + (−9)2 + (−3)2 + (9)2 + (−3)2 + (5)2=(6.25) +(20.25) +(20.25) +(2.25) +(6.25) +(42.25) +(81) +(64) +(0) +(9) +(9) +(25) +(36) +(0) +(11.1111) +(7.1111) +(1.7778) +(53.7778) +(58.7778) +(7.1111) +(1.7778) +(13.4444) +(11.1111) +(36) + (4) + (9) + (64) + (81) + (9) + (81) + (9) + (25)=97.5 +224 +166 +293=805.5
From these calculations, the within sum of squares is SSW = 805.5.
Hide the R Code
There are two ways of performing these calculations in R. The method you select will depend on how your data are stored.
Method 1: Wide Format
Copy and paste the following code into your R script window, then run it from there.
## Import data
treatment1 = c(2, 0, 9, 3, 2, 11)
treatment2 = c(-6, 11, 3, 6, 0, -2, 9, 3)
treatment3 = c(10, 4, 8, 14, -1, 4, 8, 3, 10)
treatment4 = c(-4, 4, -1, 10, -7, -1, 11, -1, 7)
## Change to Long Format
mmt = c( treatment1, treatment2, treatment3, treatment4 )
grp = c( rep("trt1",6), rep("trt2",8), rep("trt3",9), rep("trt4",9) )
## Model the data
mod = aov(mmt~grp)
summary(mod)
In the R output, the value of the sum of squares within is the number in the table under Sum Sq
and to the right of Residuals
. If you would like better precision for that value, or if you would like to have only that value, run the following code in addition to that above:
modSummary = summary(mod)
modSummary[[1]][2,2]
Here, the number outputted is the sum of squares between. How did you get the number? The summary table (also known as an ANOVA table) is just a table. Thus, the first line saves the table as the variable modSummary
the last line looks inside that variable, selects the ANOVA table ([[1]]
), and then selects the row 2, column 2 value.
Method 2: Long Format
Copy and paste the following code into your R script window, then run it from there.
## Import data
yields = c(2, 0, 9, 3, 2, 11, -6, 11, 3, 6, 0, -2, 9, 3, 10, 4, 8, 14, -1, 4, 8, 3, 10, -4, 4, -1, 10, -7, -1, 11, -1, 7)
grp = c('trt1', 'trt1', 'trt1', 'trt1', 'trt1', 'trt1', 'trt2', 'trt2', 'trt2', 'trt2', 'trt2', 'trt2', 'trt2', 'trt2', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt3', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4', 'trt4')
## Model the data
mod = aov(yields~grp)
summary(mod)
As discussed above, in the R output, the value of the sum of squares within is the number in the table under Sum Sq
and to the right of Residuals
. If you would like better precision for that value, or if you would like to have only that value, run the following code in addition to that above:
modSummary = summary(mod)
modSummary[[1]][2,2]
Here, the number outputted is the sum of squares between. How did you get the number? The summary table (also known as an ANOVA table) is just a table. Thus, the first line saves the table as the variable modSummary
the last line looks inside that variable, selects the ANOVA table ([[1]]
), and then selects the row 2, column 2 value.
Note: The difference between wide and long formats is this: In wide formatted data, each group has its own variable. In long formatted data, the group number is a variable.