Coding: Regression

# Import the necessary modules and libraries
import matplotlib.pyplot as plt
import numpy as np
from sklearn.tree import DecisionTreeRegressor
# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))
# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)
# Predict
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
# Plot the results
plt.figure()
plt.scatter(X, y, s=20, edgecolor="black", 
  c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", 
  label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", 
  label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

let be the set of examples that reach node .
If the 'the feature is a real-valued scalar partition the data at node by comparing to a threshold the set of possible thresholds for feature can be obtained by sorting the unique values of example: if feature 1 has the values , then we set . For each possible threshold, we define the left and right splits, and .	If the 'th feature is categorical with possible values check if the feature is equal to each of those values or not. it defines a set of possible binary splits: : and .)
choose the best feature to split on, and the best value for that feature, , as:

the cost function of node .
For regression, we can use the mean squared error where is the mean of the response variable for examples reaching node .	For classification, we first compute the empirical distribution over class labels for this node: Given this, we can then compute the Gini index This is the expected error rate. To see this, note that is the probability a random entry in the leaf belongs to class , and is the probability it would be misclassified.
Alternatively we can define cost as the entropy or deviance of the node: A node that is pure (i.e., only has examples of one class) will have 0 entropy.

L04 Tree-Based Methods

We will learn

Classification and regression trees (CART)

Model definition

Model fitting

The big idea of greedy algorithms

Regularization

Pros and cons

Coding: Classification

Coding: Regression

Ensemble learning

Stecking

Ensembling is not Bayes model averaging

Bagging

Random forests

Boosting

Forward stagewise additive modeling

Quadratic loss and least squares boosting

Exponential loss and AdaBoost

LogitBoost

Gradient boosting

Gradient tree boosting

XGBoost