Extension: Building and Predicting With A Regression Tree

3.12. Extension: Building and Predicting With A Regression Tree#

Building a regression tree is very similar to building a classification tree using sklearn. Let’s first look at our dataset. icecream.csv

import pandas as pd

data = pd.read_csv("icecream.csv")
print(data)

A Rain value of 0 indicates no rain, and a Rain value of 1 indicates it did rain.

To build our regression tree we import DecisionTreeRegressor instead of DecisionTreeClassifier.

In this example, instead of limiting height, we’ve set min_samples_split = 6, which means a node must contain at least 6 samples for the node to be split by a further decision.

tree = DecisionTreeRegressor(min_samples_split=n_samples)

The other change we’ve made is that in export_graphviz, we no longer need to provide the class_names, since we aren’t predicting classes. Here is a complete example.

import pandas as pd
from sklearn.tree import DecisionTreeRegressor, export_graphviz
import graphviz

data = pd.read_csv("icecream.csv")
x = data[["Temperature", "Rain"]].to_numpy()
y = data["Sales"].to_numpy()

tree = DecisionTreeRegressor(min_samples_split=6)
tree.fit(x, y)

tree_data = export_graphviz(
    tree, feature_names=["Temperature", "Rain"], rounded=True, impurity=False
)
graph = graphviz.Source(tree_data)
graph.render("Tree", format="png")

Here is a graphic showing how our training set flowed through the regression tree.

../../_images/ice_cream_weather_diagram_training.png

We can also use our model to predict ice cream sales on our test data using .predict() and calculate the mean squared error using mean_squared_error from sklearn.

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error as mse

data = pd.read_csv("icecream.csv")
x = data[["Temperature", "Rain"]].to_numpy()
y = data["Sales"].to_numpy()

tree = DecisionTreeRegressor(min_samples_split=6)
tree.fit(x, y)
# temp, rain
x_test = np.array([[21, 1], [26, 0], [13, 0], [18, 0]])
sales = np.array([2100, 4900, 1500, 4500])

prediction = tree.predict(x_test)
print("Predictions: {}".format(prediction))
print("MSE: {}".format(mse(sales, prediction)))

We can verify our models predictions by looking at how the test samples flow through our regression tree.

../../_images/ice_cream_weather_diagram_test.png

Code Challenge: Extension: Predicting With a Regression Tree

Now lets use the regression tree we just built on our avocado data avocado.csv to classify the avocados in our test data.

Month	Total Volume	Type	Year	Average Price
7	190716	0	2015	1.05
3	1045450	1	2016	1.27
9	9883	1	2017	2.15
1	16205	1	2018	1.93

Instructions

Copy and paste in your code from ‘Extension: Build a Regression Tree’, just up to where you fit the regression tree
Create a numpy array containing the avocado data shown above
Use .predict to predict the class for each object
Print the predictions
Calculate the mean squared error of your predictions and print the results

Your output should look like this:

Predictions: [X.XXXXXXXX X.XXXXXXXX X.XXXXXXXX X.XXXXXXXX]
MSE: X.XXXXXXXXXXXXXXXXX