Sabudh Data Science Internship(2021)

Posts

Showing posts from March, 2021

WEEK -9 ( 15/03/2021 - 19/03/2021 )

- March 19, 2021

In this week 9, we continue our learning with Decision Trees where we learnt about Pruning in the decision trees and we had given coursework on the implementation of Logistic Regression. Pruning The performance of a tree can be further increased by pruning . It involves removing the branches that make use of features having low importance . This way, we reduce the complexity of tree, and thus increasing its predictive power by reducing overfitting. Pruning can start at either root or the leaves. The simplest method of pruning starts at leaves and removes each node with most popular class in that leaf, this change is kept if it doesn’t deteriorate accuracy.Its also called reduced error pruning . More sophisticated pruning methods can be used such as cost complexity pruning where a learning parameter (alpha) is used to weigh whether nodes can be removed based on the size of the sub-tree. This is also known as weakest link ...

WEEK -8 ( 8/03/2021 - 12/03/2021 )

- March 11, 2021

This week we had given coursework on Model Evaluation and Selection, as we have already covered linear and logistic regression, now we are going to talk about decision trees. A tree has many analogies in real life, and turns out that it has influenced a wide area of machine learning , covering both classification and regression . In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. As the name goes, it uses a tree-like model of decisions. Though a commonly used tool in data mining for deriving a strategy to reach a particular goal, its also widely used in machine learning, which will be the main focus of this article. How can an algorithm be represented as a tree? For this let’s consider a very basic example that uses titanic data set for predicting whether a passenger will survive or not. Below model uses 3 features/attributes/columns from the data set, namely sex, age and sibsp (number of sp...

WEEK-7 ( 1/03/2021 - 5/03/2021 )

- March 04, 2021

In this week we learnt some more Model Evaluation Techniques like kappa Statistics, ROC curve and also a discussion on the implementation of Linear Regression. Kappa Statistic: It is a measure of the agreement between two raters, in this case, the random rater and the model. A better measure of accuracy as it compensates for chance matches ns is the number of observed correct classifications nR is the number of expected correct classifications nT is the size of the test data AUC-ROC: ROC curve is a plot of true positive rate (recall) against false positive rate (TN / (TN+FP)). AUC-ROC stands for Area Under the Receiver Operating Characteristics and the higher the area, the better is the model performance. If the curve is somewhere near the 50% diagonal line, it suggests that the model randomly predicts the output variable. AUC-ROC curve