Density estimation based on the nearest neighbours is another technique to estimate the unknown PDF \(\hat{p}(x)\) from observed data. It implements kind of the opposite idea of the Parzen window estimator where we place kernels at each data point with a certain side length \(h\) which determines the local influence of the kernel. Using a large \(h\) results in wide kernels which collect more points on the way. In the nearest neighbour density estimation, we approach from a different perspective. Instead of fixing the side length \(h\) and collecting a varying number of \(k\) neighbours for each kernel, we now fix \(k\) and adjust the influence area accordingly.

The idea is to use a volume function \(V_k(x)\) which gives us a value denoting how large the volume at a position \(x\) must be to cover \(k\) data points. This results in low values at positions where we have many data points and high values when only a few data points are in the neighbourhood since then we need more space to cover the same number of neighbours. This results in the following approximation of

\begin{equation} \label{eq:NearestNeighbourDensityEstimation_Approximation} \hat{p}(x) \approx \frac{k}{N} \cdot \frac{1}{V_k(x)}. \end{equation}

The volume function is inverted so that we have a high density at positions with many data points and a low density at locations where points are only sparsely located. Note that this is already a behaviour we naturally expect from a PDF. Additionally, the inverted volume is weighted by the fraction of points it reflects, i.e. the \(k\) considered points out of \(N\) total points.

The following animation shows in a one-dimensional example how \(\hat{p}(x)\) is generated for the four points

\begin{equation} \label{eq:NearestNeighbourDensityEstimation_Data} X = \begin{pmatrix} 1 \\ 1.5 \\ 3 \\ 5 \\ \end{pmatrix} \end{equation}

with the volume function

\begin{equation} \label{eq:NearestNeighbourDensityEstimation_Volume} V_k(x) = 2 \cdot \left| x - \tilde{x}_k \right|. \end{equation}

\(\tilde{x}_k\) denotes the \(k\)-th nearest neighbour to \(x\). Note how the “volume” in the one-dimensional case is only the absolute value to the last neighbour. We will use \(k=2\) in the following and the critical part is now to determine what the second nearest neighbour for each position \(x\) is since the distance to that neighbour determines the resulting density value. For this, you can use the following animation and slide over the current value \(x_c\) to see which point the second nearest neighbour is and how the resulting PDF evolves.


Figure 1: Animation which shows how \(\hat{p}(x)\) evolves when approximated by \eqref{eq:NearestNeighbourDensityEstimation_Approximation}. The volume function from \eqref{eq:NearestNeighbourDensityEstimation_Volume} with \(k=2\) is used on the one-dimensional example data \(X\) (\eqref{eq:NearestNeighbourDensityEstimation_Data}). The bottom graph shows who the first and second nearest neighbour is for the current value \(x_c\). The important locations here are those where the second nearest neighbour changes since this results also in a change of the function \(\hat{p}(x)\) in the top graph. The coloured areas in the top graph correspond to the regions \(R_i\) where the second nearest neighbour is the same. E.g. for all the \(x\) values in the red coloured region \(R_1\) the second nearest neighbour is the data point \(x_2\).

Note also that a PDF approximated by \eqref{eq:NearestNeighbourDensityEstimation_Approximation} is actually not a valid PDF. More precisely, it can be shown (page 4) that the required property

\begin{equation*} \int_{-\infty}^{\infty} \! \hat{p}(x) \, \mathrm{d}x = 1 \end{equation*}

is violated.

List of attached files:


← Back to the overview page