The correlation coefficient is an important metric to measure the linear dependency between two variables \(X\) and \(Y\). It is defined as

\begin{equation*} r_{XY} = \frac{s_{XY}}{s_{X} \cdot s_{Y}} \in [-1;1] \end{equation*}where \(s_{XY}\) denotes the covariance and \(s_{X}, s_{Y}\) the standard deviations for both variables. High magnitudes \(\left| r_{XY} \right|\) indicate a strong linear relationship between the variables. Another way of seeing this is that we start from a strong relationship and with increasing noise in our variables \(\left| r_{XY} \right|\) gets smaller.

This is illustrated in the animation below. For \(X\), points are generated from -4 to 4 in steps of 0.001 and a direct linear relationship is forced on the second variable

\begin{equation*} Y = 2X. \end{equation*}Hence, without further changes, all points lie exactly on a line leading to \(r_{XY} = 1\). To analyse the influence of noise on the variables, Gaussian noise is added to the variables separately

\begin{align*} \tilde{X} &= X + N(0, \sigma_x) \\ \tilde{Y} &= Y + N(0, \sigma_y). \end{align*}The noise parameters \(\sigma_x\) and \(\sigma_y\) can both be controlled in the animation.

List of attached files:

- CorrelationCoefficient.nb [PDF] (Mathematica notebook used to create the visualization)

← Back to the overview page