The Problem
Pearson’s sample correlation is a measure of the relationship between two numeric variables. It is used to summarize the bivariate relationship. To illustrate the calculations, let us use the following example.
A professor would like to investigate the relationship between student score on the midterm and the final exam. To do this, the professor collected the exam scores from n = 5 students. Using the data below, what is that correlation?
Here is the data. Note that there are two measurements on each unit (student). We are interested in the relationship between those two measurements.
Table of the bivariate data generated
Student | Midterm | Final |
Identifier | Score | Score |
1 | 1 | 22 |
2 | 41 | 28 |
3 | 10 | 5 |
4 | 34 | 100 |
5 | 33 | 53 |
Calculate the correlation of this sample.
Your Answer
You got the correct answer of 0.556. Congratulations!
Unfortunately, your answer was not correct. Either try again or click on “Show Solution” below to see how to obtain the correct answer.
Assistance
Hide Solution
$$ \begin{align}
r &= \frac{1}{n-1}\ \sum_{i=1}^{n}\ \left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{s_y}\right) \\[3em]
&= \frac{1}{5-1}\ \sum_{i=1}^{5}\ \left(\frac{x_i - 23.8}{17.28294}\right)\left(\frac{y_i - 41.6}{36.909348}\right) \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{x_1 - 23.8}{17.28294}\right)\left(\frac{y_1 - 41.6}{36.909348}\right) \right. \\
& \qquad + \left(\frac{x_2 - 23.8}{17.28294}\right)\left(\frac{y_2 - 41.6}{36.909348}\right) \\
& \qquad + \left(\frac{x_3 - 23.8}{17.28294}\right)\left(\frac{y_3 - 41.6}{36.909348}\right) \\
& \qquad + \left(\frac{x_4 - 23.8}{17.28294}\right)\left(\frac{y_4 - 41.6}{36.909348}\right) \\
& \qquad + \left. \left(\frac{x_5 - 23.8}{17.28294}\right)\left(\frac{y_5 - 41.6}{36.909348}\right) \right] \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{1 - 23.8}{17.28294}\right)\left(\frac{22 - 41.6}{36.909348}\right) \right. \\
& \qquad + \left(\frac{41 - 23.8}{17.28294}\right)\left(\frac{28 - 41.6}{36.909348}\right) \\
& \qquad + \left(\frac{10 - 23.8}{17.28294}\right)\left(\frac{5 - 41.6}{36.909348}\right) \\
& \qquad + \left(\frac{34 - 23.8}{17.28294}\right)\left(\frac{100 - 41.6}{36.909348}\right) \\
& \qquad + \left. \left(\frac{33 - 23.8}{17.28294}\right)\left(\frac{53 - 41.6}{36.909348}\right) \right] \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{-22.8}{17.28294}\right)\left(\frac{-19.6}{36.909348}\right) \right. \\
& \qquad + \left(\frac{17.2}{17.28294}\right)\left(\frac{-13.6}{36.909348}\right) \\
& \qquad + \left(\frac{-13.8}{17.28294}\right)\left(\frac{-36.6}{36.909348}\right) \\
& \qquad + \left(\frac{10.2}{17.28294}\right)\left(\frac{58.4}{36.909348}\right) \\
& \qquad + \left. \left(\frac{9.2}{17.28294}\right)\left(\frac{11.4}{36.909348}\right) \right] \\[1em]
&= \frac{1}{4}\ \Big[\ \left(-1.31922\right)\left(-0.531031\right) \Big. \\
& \qquad + \left(0.995201\right)\left(-0.36847\right) \\
& \qquad + \left(-0.798475\right)\left(-0.991619\right) \\
& \qquad + \left(0.590177\right)\left(1.582255\right) \\
& \qquad + \Big. \left(0.532317\right)\left(0.308865\right)\ \Big] \\[1em]
&= \frac{1}{4}\ \Big[\ \left(0.700546\right) \Big. \\
& \qquad + \left(-0.366702\right) \\
& \qquad + \left(0.791783\right) \\
& \qquad + \left(0.933811\right) \\
& \qquad + \Big. \left(0.164414\right)\ \Big] \\[1em]
&= \frac{1}{4}\ \Big[\ 2.2239\ \Big] \\[1em]
\end{align}
$$
And so, the correlation between the midterm and final examination scores in this sample is r = 0.556.
As this value is positive, it indicates that those who scored better on the midterm also scored better on the final.
Note, too, that we cannot (at this point) draw any conclusion about the relationship between these two variables in the population. This is only a measure of that relationship in this sample.
Hide the R Code
Copy and paste the following code into your R script window, then run it from there.
xvals = c(1, 41, 10, 34, 33)
yvals = c(22, 28, 5, 100, 53)
cor(xvals,yvals)
In the R output, the sample correlation is the number output by the script.
Hide the Excel Code
Copy and paste the following code into your Excel spreadsheet window, making sure your cursor is in A1
when you paste.
Copy and paste the following code into your Excel spreadsheet window, making sure the value x-vals
ends up in A1
after pasting.
How to calculate the expected value in Excel.
x-vals | y-vals | | correlation |
1 |
22 |
r: |
=CORREL(A:A,B:B) |
41 |
28 |
|
|
10 |
5 |
|
|
34 |
100 |
|
|
33 |
53 |
|
|
Make sure that you begin pasting in cell A1
.