KNN Regression 2D

4.7. KNN Regression 2D#

What we’ve looked as in KNN in 1D where there was only 1 input variable. But the same still works for if you have multiple input variables. If you have 2 input variables you can still visualise the results in 2D space.

Consider the following dataset containing giraffe age, weight and heights.

Age (years)

Height (x100kg)

Height (cm)

1.0

4

2.7

1.7

6

3.1

3.0

5

3.2

2.4

6.5

3.4

4.7

10

4.1

5.2

7

3.7

4.5

6.5

3.6

../../_images/giraffe_height_weight.png

We can plot this data as shown below, including our test sample.

../../_images/knn-2D.png

To make a prediction, we just look at the k closest neighbours. Let’s consider our test sample which is 2.8 years old and weights 7 tonnes.

If we set k = 1, we’re just looking at our nearest neighbour. This neighbour has a height 3.4, so the prediction we make is 3.4. We’ve drawn a circle here to represent the ‘neighbourhood’. Remember that the distance from the edge of the circle to the centre is the radius. So everything inside the circle is closer than everything outside the circle.

../../_images/knn-2D-k1.png

Now let’s set k = 2. We draw our circle a little bigger so we capture our two closest neighbours. This time our prediction is the average of the nearest neighbours, which is 3.25.

../../_images/knn-2D-k2.png

Now let’s set k = 3. We draw our circle a little bigger so we capture our three closest neighbours. This time our prediction is the average of the nearest neighbours, which is 3.23.

../../_images/knn-2D-k3.png

We can keep going for different values of k. And it’s up to you to decide what value of k to use.

We’ve just seen how KNN regression works for 2 input variables. It works the same for more input variables, it’s just harder to visualise. The computer is still able to calculate a distance between data points and make predictions based on the average value of the k nearest neighbours.