Extension: Building a KNN Classification Model

4.12. Extension: Building a KNN Classification Model#

Building a KNN model using sklearn is very similar to how we have built our KNeighborsRegressor model, except instead of a regressor we use classifier.

Let’s first look at our dataset views.csv. It contains social score and economic score of each voter, and the way each voter has voted. There are 5 parties the voters can vote for:

0: The green party
1: The grey party
2: The red party
3: The blue party
4: The orange party

import pandas as pd

data = pd.read_csv("views.csv")
print(data)

4.12.1. KNN Classification 1D#

We’ll start by just predicting the party a voter votes for based on their social score. To build our KNN model we import the model.

from sklearn.neighbors import KNeighborsClassifier

Then we create the model and specify the number of neighbours we want, i.e. the value of k. For now we’ll set k=1.

knn = KNeighborsClassifier(n_neighbors=1)

We fit our model to the data using

knn.fit(x, y)

Note that just like with our linear regression model, x must be a 2D array, so we use .reshape(-1, 1) to get the right dimensions.

We can make predictions using

knn.predict(x)

Here is the complete example:

from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np

# Load data
data = pd.read_csv("views.csv")
x = data["social"].to_numpy()
y = data["vote"].to_numpy()

# Build KNN model
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(x.reshape(-1, 1), y)

# Make prediction
samples = np.array([0.1, 0.3, 0.4, 0.6, 0.7, 0.9])
print(knn.predict(samples.reshape(-1, 1)))

The model’s predictions are shown in the table below.

Social Score	Prediction
0.1	4
0.3	3
0.4	2
0.6	2
0.7	1
0.9	0

Here’s a visualisation of the predictions.

4.13. KNN 2D#

We can give more than 1 input variable to our KNN classifier model. This is similar to how you would build a multiple linear regression model. When we read in the data, we read multiple columns into our variable x.

DataFrame[[column_1, column_2, ...]]

We then need to reshape our data using .reshape(-1, N), where N is the number of input variables. In the 2D case, N is 2.

Here’s a complete example where we give the social score and economic score. We have also provided the test sample (0.45, 0.7).

from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load data
data = pd.read_csv("views.csv")
x = data[["economic", "social"]].to_numpy()
y = data["vote"].to_numpy()

# Build KNN model
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(x.reshape(-1, 2), y)

# Make prediction
samples = np.array([0.45, 0.7])
print(knn.predict(samples.reshape(-1, 2)))

Try changing the value of k from 1 to 4. You should see that

for k=1 the model predicts 1
for k=4 the model predicts 0
for k=14 the model predicts 2

Here’s a visualisation of these predictions.

Extension: Building a KNN Classification Model

Contents

4.12. Extension: Building a KNN Classification Model#

4.12.1. KNN Classification 1D#

4.13. KNN 2D#