Sabudh Data Science Internship(2021)

Posts

Showing posts from May, 2021

WEEK-16 ( 24/05/2021 - 28/05/2021 )

- May 29, 2021

Week 16 covered all the aspects of a recommender system, so everything related to the recommender systems was taught to us and a News Recommender Assignment was also given to us to implement everything we learned during the week. The assignment problem is discussed at the end of the blog. Recommender Systems Introduction During the last few decades, with the rise of Youtube, Amazon, Netflix, and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys. In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy, or anything else depending on industries). Outline In the first section, we are going to overview the two...

WEEK-15 ( 17/05/2021 - 21/05/2021 )

- May 22, 2021

In this week, we were taught about kd trees and similarity and distance metrics like euclidean distance and Pearson’s correlation. Let’s firstly start with kd trees KD Tree Algorithm The KD Tree Algorithm is one of the most commonly used Nearest Neighbor Algorithms. The data points are split at each node into two sets. Like the previous algorithm, the KD Tree is also a binary tree algorithm always ending in a maximum of two nodes. The split criteria chosen are often the median. On the right side of the image below, you can see the exact position of the data points, on the left side the spatial position of them. Data points and their position in a coordinate system. The KD-Tree Algorithm uses first the median of the first axis and then, in the second layer, the median of the second axis. We’ll start with axis X. The in ascending order sorted x-values are: 1,2,3,4,4,6,7,8,9,9. Followingly, the median is 6. The data points are then divided into smaller and bigger equal to 6. T...