Building a Polynomial Regression Model

2.4. Building a Polynomial Regression Model#

To construct the 2D array required as input to our polynomial regression model we can use PolynomialFeatures from sklearn.preprocessing. We import it with the following:

from sklearn.preprocessing import PolynomialFeatures

Then we create a PolynomialFeatures object:

poly = PolynomialFeatures(degree, include_bias=False)

We set include_bias to False. bias is another word for intercept and our linear regression model will automatically calculate the intercept for us.

Finally, we pass in the input variable we’re interested in.

X = poly.fit_transform(x)

Note

x must be a 2D array with n rows, one for each sample in the dataset and 1 column. An easy way to achieve this is to use .reshape(-1, 1)

Here’s a quick example of how it works on an array containing the values [1, 2, 3, 4, 5]. You’ll notice that the first column is \(x\), the second column is \(x^2\) and the third column is \(x^3\).

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

x = np.array([1, 2, 3, 4, 5])

poly = PolynomialFeatures(3, include_bias=False)
X = poly.fit_transform(x.reshape(-1, 1))

print(X)

The following is a complete example of how we would fit a polynomial model of degree 3 to our heart rate during cool down time data, which is stored in cool_down.csv. You will notice that the code is very similar to the code we used to build our linear regression model.

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Load data
data = pd.read_csv("cool_down.csv")
x = data["Cool Down Time (mins)"].to_numpy()
y = data["Heart Rate (bpm)"].to_numpy()

# Build polynomial regression model
poly = PolynomialFeatures(3, include_bias=False)
X = poly.fit_transform(x.reshape(-1, 1))
linear_reg = LinearRegression()
linear_reg.fit(X, y)

# Create x and y values to visualise the model function
x_model = np.linspace(0, 10).reshape(-1, 1)
X_model = poly.fit_transform(x_model)
y_model = linear_reg.predict(X_model)

# Visualise the results
plt.figure(figsize=(4, 4))
plt.scatter(x, y)
plt.plot(x_model, y_model, color="red")
plt.xlabel("Cool Down Time (mins)")
plt.ylabel("Heart Rate (bpm)")
plt.tight_layout()
plt.savefig("plot.png")