Extension: Building a KNN Regression Model

4.8. Extension: Building a KNN Regression Model#

Building a KNN model using sklearn is very similar to how we have built previous models.

Let’s first look at our dataset. It contains giraffe data giraffe.csv. For each giraffe we have its age in years, its weight in x100kg, its sex (0: female, 1:male) and its height in m.

import pandas as pd

data = pd.read_csv("giraffe.csv")
print(data)

We’ll start by just predicting the height of a giraffe from its age. To build our KNN model we import the model.

from sklearn.neighbors import KNeighborsRegressor

Then we create the model and specify the number of neighbours we want, i.e. the value of k. For now we’ll set k=1.

knn = KNeighborsRegressor(n_neighbors=1)

We fit our model to the data using

knn.fit(x, y)

Note that just like with our linear regression model, x must be a 2D array, so we use .reshape(-1, 1) to get the right dimensions.

We can make predictions using

knn.predict(x)

Again, we need to reshape x. We can use .predict() to visualise our model.

Here is a full example:

Note

Try experimenting with different values of k, i.e. changing the value of n_neighbors. You should notice that as you increase the number of neighbours the model looks smoother.

from sklearn.neighbors import KNeighborsRegressor
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load data
data = pd.read_csv("giraffe.csv")
x = data["Age"].to_numpy()
y = data["Height"].to_numpy()

# Build KNN model
knn = KNeighborsRegressor(n_neighbors=1)
knn.fit(x.reshape(-1, 1), y)

# Create x and y values to visualise the model function
x_model = np.linspace(0, 25, 200)
y_model = knn.predict(x_model.reshape(-1, 1))

# Visualise the results
plt.figure(figsize=(4, 4))
plt.scatter(x, y)  # Data
plt.plot(x_model, y_model, color="red")  # Model
plt.xlabel("Age (years)")
plt.ylabel("Height (m)")
plt.tight_layout()