But First: A Theorem From 1785
All the way back in the year 1785, Marquis de Condorcet expressed a political science
theorem about the relative probability of a given group of individuals arriving at a correct majority
decision.
Despite its age, the theorem has been applied to a number of different fields, most notably
for us the field of machine learning. To see how, let's start with one model, a decision tree with an accuracy
of 60 percent.
Condorcet's Jury Theorem says that if each person is more than
50% correct, then adding more people to vote increases the probability that the majority is correct.
True to form, if we ask two more decision trees, each also with 60% accuracy, and decide by the majority vote,
then the probability of the vote being right goes up to 65%!
The theorem suggests that the probability of the majority vote being correct can go up as
we add more and more models. With eleven models, the majority can reach 75% accuracy!
There is a
caveat. The accuracy may not improve if each model produces the same prediction, for example. The mistake of
one model would not be caught by the other models. Therefore, we would want to cast a majority vote using the
models that are different.
We can add more and more models.
Play with the scrollers
yourself! The top controls the number of trees in the ensemble. The bottom controls each tree's
accuracy.
In machine learning, this concept of multiple models working together to come to an
aggregate prediction is called ensemble learning. It provides the basis for many
important machine learning models, including random forests.