Random Forest

Bagging Method

One way to produce multiple models that are different is to train each model using a different training set. The Bagging (Bootstrap Aggregating) method randomly draws a fixed number of samples from the training set with replacement. This means that a data point can be drawn more than once.

This results in several training sets that are different. Here, we have three samples. Notice that some data points are shared among some samples. Each sample then can be used to build a model. Let’s build some decision trees with these samples!

Feature Selection

Here are the features: Size, Number of Sides, Number of Colors Used, and Text or Symbol.

For each split of the tree building, we compute the best splitting using only a randomly selected subset of the features. This is another way to ensure that the decision trees are as different as possible.

Let's build the first tree with the first sample! For the first split, three features are selected to find the best split.

The best feature to split is "Number of Colors used."

Likewise, for the second split, another set of three features are selected to find the best split. The best feature to split this time is "Text or Symbol."

This process continues until the tree is constructed. The other samples will build other trees using the same process.

Here are trees built from each sample.

Notice how these trees are quite different from each other.

Let's bring some test data to see the random forest in action!

Each tree produces a prediction.

Can a small circular sign be a crossings sign? Let's ask the three trees!

Majority vote

The first tree thinks that the sign is not a crossings sign. However, the other trees voted yes. By the majority vote, the prediction is "Yes"!

It's your turn

Click the remaining data points to see how the random forest makes a prediction.

Did you notice that even though the trees produce the same prediction, the decision path - the reason why each thinks the sign is a crossing sign or not - is different from each other? This is good news: The random forest models perform better when the trees in the forest are different. Let's dive deep into this!

But First: A Theorem From 1785

Ensemble Learning