Ensemble Methods

Zachary Greenberg
4 min readAug 26, 2021
Image Source

Teamwork makes the dreamwork is not only a great book by John Maxwell, it is a saying that proves to be true when it comes to machine learning models. What is being referred to here is ensemble methods. Ensemble methods refer to a specific modeling process in which multiple models come together to generate a singular outcome. There is a method to this madness of multiple models. These models, together, act as a similar process to cross validation hoping to reduce overfitting and overall model error. There are 3 specific types of ensemble methods worthy to be mentioned: Stacking, Bagging, and Boosting.

What is Stacking?

‘Stacking is an ensemble learning technique that uses predictions for multiple nodes(for example kNN, decision trees, or SVM) to build a new model. This final model is used for making predictions on the test dataset.’ — as defined by analyticsvidhya.

Stacking will take different, simple models and use their outcomes to generate a final model. With this technique, we can use a variation of different models. So if, for example, we are performing a classification task, we can use models like Logistic Regression, K Nearest Neighbors, and Decision Tree together to create a more robust model. It is important to note that this is not grid search with models, the outcome will not be the best model of the three, it will work with the individual model’s outputs to create a final output. And with SciKit Learn, the optimal model and score is automated. It is a good practice when implementing stacking to utilize pipelines like so:

from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
#it is important to scale your data!
pipe = Pipeline(steps=[
('scaler', StandardScaler())
])
pipe.fit_transform(X_train)estimators = [
('lr', LinearRegression()),
('knn', KNeighborsRegressor()),
('rt', DecisionTreeRegressor())
]
sr = StackingRegressor(estimators)sr.fit(X_train, y_train)sr.score(X_test, y_test)

What is Bagging?

Bagging is the process of creating multiple instances of the same type of model. These models are created with samples of the data, and each sample is different every time. Typically the samples are created with replacement. This is a technique known as bootstrapping. It is for the data scientist to decide the number of models they would like in the mix. Each of these models are independent of one another, and their outcomes are compared to each other to yield the most favorable score/results.

from sklearn.ensemble import BaggingClassifier#by default base_estimator will be Decision Tree, and n_estimators #is the number of models you'd like to see.bag = BaggingClassifier(base_estimator = DecisionTreeClassifier, n_estimators=100, verbose=1, random_state=1)bag.fit(X_train, y_train)bag.score(X_test, y_test)

What is Boosting?

Boosting is a technique that also is grounded on the concept of using multiple models. There are two specific types of boosting models: adaptive and gradient boosting.

What is adaptive boosting?

‘The method [adaptive boosting] automatically adjusts its parameters to the data based on the actual performance in the current iteration.’ — as explained by analyticsindiamag

In adaptive boosting, the weights on the data are iteratively adjusted in the mechanism of the model. In the case of classification, the incorrectly identified data point will be given a larger weight to focus on the that point in the next iterative sequence of the model. Much like neural networks, this process is automated with adaptive boosting.

What is gradient boosting?

Gradient boosting, much like it sounds, relies on gradient descent to minimize errors. In this algorithm, the larger residuals are what make the model concerned and work harder to shrink. Here the iterative process becomes directly affected from the previous model.

What is the difference between them?

Both boosters work with an iterative process to minimize error. Gradient boosting, however, can prove to be more robust to outliers because Adaptive boosting’s large penalization on the weights.

from sklearn.ensemble import AdaBoostRegressor
from xgboost import XBGRegressor
#Adaptive Boostingabr = AdaBoostRegressor(random_state=42)abr.fit(X_train, y_train)abr.score(X_test, y_test)#Gradient Boostinggrad_boost = xgboost.XGBRegressor(random_state=42, objective='reg:squarederror')grad_boost.fit(X_train, y_train)grad_boost.score(X_test, y_test)

As we can see there are many methods in the ensemble modeling family. Each has great benefits as they can prove to be more robust than simpler models like Decision Tree and K Nearest Neighbors. They can really help to prevent overfitting on models. Additionally the utilization of multiple models can also result in a lower error. Ensemble methods are a powerful group of techniques, no pun intended.

References:

Adaptive Boosting — https://analyticsindiamag.com/adaboost-vs-gradient-boosting-a-comparison-of-leading-boosting-algorithms/

Stacking — https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/

SKLearn Ensemble Methods — https://scikit-learn.org/stable/modules/ensemble.html

--

--