WEEK -10 and 11 ( 22/03/2021 - 2/04/2021 )
It’s week 10!,here we are gonna discuss ensemble models. Some briefing of the teachings are written below
What is an ensemble method?
Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. They operate on the similar idea as employed while for eg buying headphones.
The main causes of error in learning models are due to noise, bias and variance.
Ensemble methods help to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms.
Scenario 2: Let’s suppose that you have developed a health and fitness app. Before making it public, you wish to receive critical feedback to close down the potential loopholes, if any. You can resort to one of the following methods, read and decide which method is the best:
- You can take the opinion of your spouse or your closest friends.
- You can ask a bunch of your friends and office colleagues.
- You can launch a beta version of the app and receive feedback from the web development community and non-biased users.
Now, pause and think what you just did. You took multiple opinions from a large enough bunch of people and then made an informed decision based on them. This is what Ensemble methods also do.
Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance.
Scenario 3: Take a look at the following picture; we can see a group of blindfolded children playing the game of “Touch and tell” while examining an elephant which none of them had ever seen before. Each of them will have a different version as to how does an elephant looks like because each of them is exposed to a different part of the elephant. Now, if we give them a task of submitting a report on elephant description, their individual reports will be able to describe only one part accurately as per their experience but collectively they can combine their observations to give a very accurate report on the description of an elephant.
Similarly, ensemble learning methods employ a group of models where the combined result out of them is almost always better in terms of prediction accuracy as compared to using a single model.
Ensembles are a divide and conquer approach used to improve performance.

Now, let’s dive into some of the important Ensemble techniques.

Simple Ensemble techniques
- Taking the mode of the results
MODE: The mode is a statistical term that refers to the most frequently occurring number found in a set of numbers.
In this technique, multiple models are used to make predictions for each data point. The predictions by each model are considered as a separate vote. The prediction which we get from the majority of the models is used as the final prediction.
For instance: We can understand this by referring back to Scenario 2 above. I have inserted a chart below to demonstrate the ratings that the beta version of our health and fitness app got from the user community. (Consider each person as a different model)
Output= MODE=3, as majority people voted this

2. Taking the average of the results
In this technique, we take an average of predictions from all the models and use it to make the final prediction.
AVERAGE= sum(Rating*Number of people)/Total number of people= (1*5)+(2*13)+(3*45)+(4*7)+(5*2)/72 = 2.833 =Rounded to nearest integer would be 3
3. Taking weighted average of the results
This is an extension of the averaging method. All models are assigned different weights defining the importance of each model for prediction. For instance, if about 25 of your responders are professional app developers, while others have no prior experience in this field, then the answers by these 25 people are given more importance as compared to the other people.
For example: For posterity, I am trimming down the scale of the example to 5 people
WEIGHTED AVERAGE= (0.3*3)+(0.3*2)+(0.3*2)+(0.15*4)+(0.15*3) =3.15 = rounded to nearest integer would give us 3

Advanced Ensemble techniques
We’ll learn about Bagging and Boosting techniques now. But, to use them you must select a base learner algorithm. For example, if we choose a classification tree, Bagging and Boosting would consist of a pool of trees as big as we want.
1. Bagging (Bootstrap AGGregatING)
Bootstrap Aggregating is an ensemble method. First, we create random samples of the training data set with replacment (sub sets of training data set). Then, we build a model (classifier or Decision tree) for each sample. Finally, results of these multiple models are combined using average or majority voting.
As each model is exposed to a different subset of data and we use their collective output at the end, so we are making sure that problem of overfitting is taken care of by not clinging too closely to our training data set. Thus, Bagging helps us to reduce the variance error.
Combinations of multiple models decreases variance, especially in the case of unstable models, and may produce a more reliable prediction than a single model.
Random forest technique actually uses this concept but it goes a step ahead to further reduce the variance by randomly choosing a subset of features as well for each bootstrapped sample to make the splits while training (My next post will detail all about Random forest technique)

2. Boosting

Boosting is an iterative technique which adjusts the weight of an observation based on the last classification. If an observation was classified incorrectly, it tries to increase the weight of this observation and vice versa.
Boosting in general decreases the bias error and builds strong predictive models. Boosting has shown better predictive accuracy than bagging, but it also tends to over-fit the training data as well.Thus, parameter tuning becomes a crucial part of boosting algorithms to make them avoid overfitting.
Boosting is a sequential technique in which, the first algorithm is trained on the entire data set and the subsequent algorithms are built by fitting the residuals of the first algorithm, thus giving higher weight to those observations that were poorly predicted by the previous model.

It relies on creating a series of weak learners each of which might not be good for the entire data set but is good for some part of the data set. Thus, each model actually boosts the performance of the ensemble.

Summary of differences between Bagging and Boosting

Advantages/Benefits of ensemble methods
Ensemble methods are used in almost all the ML hackathons to enhance the prediction abilities of the models.
Let’s take a look at the advantages of using ensemble methods:
- More accurate prediction results– We can compare the working of the ensemble methods to the Diversification of our financial portfolios. It is advised to keep a mixed portfolio across debt and equity to reduce the variability and hence, to minimize the risk. Similarly, the ensemble of models will give better performance on the test case scenarios (unseen data) as compared to the individual models in most of the cases.

2. Stable and more robust model– The aggregate result of multiple models is always less noisy than the individual models. This leads to model stability and robustness.

3. Ensemble models can be used to capture the linear as well as the non-linear relationships in the data.This can be accomplished by using 2 different models and forming an ensemble of the two.
Disadvantages of ensemble methods
- Reduction in model interpret-ability- Using ensemble methods reduces the model interpret-ability due to increased complexity and makes it very difficult to draw any crucial business insights at the end.
- Computation and design time is high- It is not good for real time applications.
- The selection of models for creating an ensemble is an art which is really hard to master.
Comments
Post a Comment