The Problem
Pearson’s sample correlation is a measure of the relationship between two numeric variables. It is used to summarize the bivariate relationship. To illustrate the calculations, let us use the following example.
A professor would like to investigate the relationship between student score on the midterm and the final exam. To do this, the professor collected the exam scores from n = 5 students. Using the data below, what is that correlation?
Here is the data. Note that there are two measurements on each unit (student). We are interested in the relationship between those two measurements.
Table of the bivariate data generated
Student  Midterm  Final 
Identifier  Score  Score 
1  51  77 
2  37  78 
3  21  89 
4  36  21 
5  42  66 
Calculate the correlation of this sample.
Your Answer
You got the correct answer of 0.1461. Congratulations!
Unfortunately, your answer was not correct. Either try again or click on “Show Solution” below to see how to obtain the correct answer.
Assistance
Hide Solution
$$ \begin{align}
r &= \frac{1}{n1}\ \sum_{i=1}^{n}\ \left(\frac{x_i  \bar{x}}{s_x}\right)\left(\frac{y_i  \bar{y}}{s_y}\right) \\[3em]
&= \frac{1}{51}\ \sum_{i=1}^{5}\ \left(\frac{x_i  37.4}{10.922454}\right)\left(\frac{y_i  66.2}{26.546186}\right) \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{x_1  37.4}{10.922454}\right)\left(\frac{y_1  66.2}{26.546186}\right) \right. \\
& \qquad + \left(\frac{x_2  37.4}{10.922454}\right)\left(\frac{y_2  66.2}{26.546186}\right) \\
& \qquad + \left(\frac{x_3  37.4}{10.922454}\right)\left(\frac{y_3  66.2}{26.546186}\right) \\
& \qquad + \left(\frac{x_4  37.4}{10.922454}\right)\left(\frac{y_4  66.2}{26.546186}\right) \\
& \qquad + \left. \left(\frac{x_5  37.4}{10.922454}\right)\left(\frac{y_5  66.2}{26.546186}\right) \right] \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{51  37.4}{10.922454}\right)\left(\frac{77  66.2}{26.546186}\right) \right. \\
& \qquad + \left(\frac{37  37.4}{10.922454}\right)\left(\frac{78  66.2}{26.546186}\right) \\
& \qquad + \left(\frac{21  37.4}{10.922454}\right)\left(\frac{89  66.2}{26.546186}\right) \\
& \qquad + \left(\frac{36  37.4}{10.922454}\right)\left(\frac{21  66.2}{26.546186}\right) \\
& \qquad + \left. \left(\frac{42  37.4}{10.922454}\right)\left(\frac{66  66.2}{26.546186}\right) \right] \\[1em]
&= \frac{1}{4}\ \left[ \left(\frac{13.6}{10.922454}\right)\left(\frac{10.8}{26.546186}\right) \right. \\
& \qquad + \left(\frac{0.4}{10.922454}\right)\left(\frac{11.8}{26.546186}\right) \\
& \qquad + \left(\frac{16.4}{10.922454}\right)\left(\frac{22.8}{26.546186}\right) \\
& \qquad + \left(\frac{1.4}{10.922454}\right)\left(\frac{45.2}{26.546186}\right) \\
& \qquad + \left. \left(\frac{4.6}{10.922454}\right)\left(\frac{0.2}{26.546186}\right) \right] \\[1em]
&= \frac{1}{4}\ \Big[\ \left(1.245141\right)\left(0.406838\right) \Big. \\
& \qquad + \left(0.036622\right)\left(0.444508\right) \\
& \qquad + \left(1.501494\right)\left(0.85888\right) \\
& \qquad + \left(0.128176\right)\left(1.702693\right) \\
& \qquad + \Big. \left(0.421151\right)\left(0.007534\right)\ \Big] \\[1em]
&= \frac{1}{4}\ \Big[\ \left(0.506571\right) \Big. \\
& \qquad + \left(0.016279\right) \\
& \qquad + \left(1.289604\right) \\
& \qquad + \left(0.218245\right) \\
& \qquad + \Big. \left(0.003173\right)\ \Big] \\[1em]
&= \frac{1}{4}\ \Big[\ 0.5842\ \Big] \\[1em]
\end{align}
$$
And so, the correlation between the midterm and final examination scores in this sample is r = 0.1461.
As this value is negative, it indicates that those who scored better on the midterm scored worse on the final.
This relationship, however, is weak.
Note, too, that we cannot (at this point) draw any conclusion about the relationship between these two variables in the population. This is only a measure of that relationship in this sample.
Hide the R Code
Copy and paste the following code into your R script window, then run it from there.
xvals = c(51, 37, 21, 36, 42)
yvals = c(77, 78, 89, 21, 66)
cor(xvals,yvals)
In the R output, the sample correlation is the number output by the script.
Hide the Excel Code
Copy and paste the following code into your Excel spreadsheet window, making sure your cursor is in A1
when you paste.
Copy and paste the following code into your Excel spreadsheet window, making sure the value xvals
ends up in A1
after pasting.
How to calculate the expected value in Excel.
xvals  yvals   correlation 
51 
77 
r: 
=CORREL(A:A,B:B) 
37 
78 


21 
89 


36 
21 


42 
66 


Make sure that you begin pasting in cell A1
.