WEEK-4 ( 8/02/2021 - 12/02/2021 )
In this week we learnt about Central Limit Theorem, Confidence Interval, Lasso and Ridge Regression.
Central Limit Theorem: states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed
Lasso Regression: The “LASSO” stands for Least Absolute Shrinkage and Selection Operator. Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination. Lasso Regression uses L1 regularization technique (will be discussed later in this article). It is used when we have more number of features because it automatically performs feature selection.
Mathematical Form:
Residual sum of Squares + λ * (Sum of the absolute value of the magnitude of coefficients)
Where,
λ denotes the amount of shrinkage.
λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model
λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more features
The bias increases with an increase in λ
variance increases with a decrease in λ
Ridge Regression :
Ridge regression is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.
The cost function for ridge regression:
Min(||Y – X(theta)||^2 + λ||theta||^2)
Lambda is the penalty term. λ given here is denoted by an alpha parameter in the ridge function. So, by changing the values of alpha, we are controlling the penalty term. The higher the values of alpha, the bigger is the penalty and therefore the magnitude of coefficients is reduced.
It shrinks the parameters. Therefore, it is used to prevent multicollinearity.
It reduces the model complexity by coefficient shrinkage.
Comments
Post a Comment