In this week our learning about recommender continues, also in this week, we started more on our project and also learnt about Text Analysis. Text Vectorization Text Vectorization is the process of converting text into numerical representation. Here are some popular methods to accomplish text vectorization: TF-IDF(Term Frequency Inverse Document Frequency) Word2Vec TF-IDF TF-IDF stands for Term Frequency-Inverse Document Frequency which basically tells importance of the word in the corpus or dataset. TF-IDF contain two concept Term Frequency(TF) and Inverse Document Frequency(IDF) Term Frequency Term Frequency is defined as how frequently the word appear in the document or corpus. As each sentence is not the same length so it may be possible a word appears in long sentence occur more time as compared to word appear in sorter sentence. Term frequency can be defined as: Inverse Document Frequency Inverse Document frequency is another concept which is used for finding ou...
In this week 9, we continue our learning with Decision Trees where we learnt about Pruning in the decision trees and we had given coursework on the implementation of Logistic Regression. Pruning The performance of a tree can be further increased by pruning . It involves removing the branches that make use of features having low importance . This way, we reduce the complexity of tree, and thus increasing its predictive power by reducing overfitting. Pruning can start at either root or the leaves. The simplest method of pruning starts at leaves and removes each node with most popular class in that leaf, this change is kept if it doesn’t deteriorate accuracy.Its also called reduced error pruning . More sophisticated pruning methods can be used such as cost complexity pruning where a learning parameter (alpha) is used to weigh whether nodes can be removed based on the size of the sub-tree. This is also known as weakest link ...
Comments
Post a Comment