WEEK-16 ( 24/05/2021 - 28/05/2021 )
Week 16 covered all the aspects of a recommender system, so everything related to the recommender systems was taught to us and a News Recommender Assignment was also given to us to implement everything we learned during the week. The assignment problem is discussed at the end of the blog.
Recommender Systems
Introduction
During the last few decades, with the rise of Youtube, Amazon, Netflix, and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy, or anything else depending on industries).
Outline
In the first section, we are going to overview the two major paradigms of recommender systems: collaborative and content-based methods. The next two sections will then describe various methods of collaborative filterings, such as user-user, item-item and matrix factorization. The following section will be dedicated to content-based methods and how they work. Finally, we will discuss how to evaluate a recommender system.
During the last few decades, with the rise of Youtube, Amazon, Netflix, and many other such web services, recommender systems have taken more and more place in our lives. From e-commerce (suggest to buyers articles that could interest them) to online advertisement (suggest to users the right contents, matching their preferences), recommender systems are today unavoidable in our daily online journeys.
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy, or anything else depending on industries).
Outline
In the first section, we are going to overview the two major paradigms of recommender systems: collaborative and content-based methods. The next two sections will then describe various methods of collaborative filterings, such as user-user, item-item and matrix factorization. The following section will be dedicated to content-based methods and how they work. Finally, we will discuss how to evaluate a recommender system.
Collaborative versus content
The purpose of a recommender system is to suggest relevant items to users. To achieve this task, there exist two major categories of methods: collaborative filtering methods and content-based methods. Before digging more into details of particular algorithms, let’s discuss briefly these two main paradigms.
The purpose of a recommender system is to suggest relevant items to users. To achieve this task, there exist two major categories of methods: collaborative filtering methods and content-based methods. Before digging more into details of particular algorithms, let’s discuss briefly these two main paradigms.
Collaborative filtering methods
Collaborative methods for recommender systems are methods that are based solely on the past interactions recorded between users and items in order to produce new recommendations. These interactions are stored in the so-called “user-item interactions matrix”.
Illustration of the user-item interactions matrix.Then, the main idea that rules collaborative methods is that these past user-item interactions are sufficient to detect similar users and/or similar items and make predictions based on these estimated proximities.
The class of collaborative filtering algorithms is divided into two sub-categories that are generally called memory-based and model-based approaches. Memory-based approaches directly work with values of recorded interactions, assuming no model, and are essentially based on nearest neighbors search (for example, find the closest users from a user of interest and suggest the most popular items among these neighbors). Model-based approaches assume an underlying “generative” model that explains the user-item interactions and tries to discover them in order to make new predictions.
Overview of the collaborative filtering methods paradigm.The main advantage of collaborative approaches is that they require no information about users or items and, so, they can be used in many situations. Moreover, the more users interact with items the more new recommendations become accurate: for a fixed set of users and items, new interactions recorded over time bring new information and make the system more and more effective.
However, as it only considers past interactions to make recommendations, collaborative filtering suffers from the “cold start problem”: it is impossible to recommend anything to new users or to recommend a new item to any users and many users or items have too few interactions to be efficiently handled. This drawback can be addressed in a different way: recommending random items to new users or new items to random users (random strategy), recommending popular items to new users or new items to most active users (maximum expectation strategy), recommending a set of various items to new users or a new item to a set of various users (exploratory strategy) or, finally, using a non collaborative method for the early life of the user or the item.
In the following sections, we will mainly present three classical collaborative filtering approaches: two memory-based methods (user-user and item-item) and one model-based approach (matrix factorization).
Content-based methods
Unlike collaborative methods that only rely on the user-item interactions, content based approaches use additional information about users and/or items. If we consider the example of a movies recommender system, this additional information can be, for example, the age, the sex, the job or any other personal information for users as well as the category, the main actors, the duration or other characteristics for the movies (items).
Then, the idea of content based methods is to try to build a model, based on the available “features”, that explain the observed user-item interactions. Still considering users and movies, we will try, for example, to model the fact that young women tend to rate better some movies, that young men tend to rate better some other movies and so on. If we manage to get such model, then, making new predictions for a user is pretty easy: we just need to look at the profile (age, sex, …) of this user and, based on this information, to determine relevant movies to suggest.
Overview of the content based methods paradigm.Content based methods suffer far less from the cold start problem than collaborative approaches: new users or items can be described by their content and so relevant suggestions are made. Only new users or items with previously unseen features will logically suffer from this drawback, but once the system old enough, this has few to no chance to happen.
Later in this post, we will further discuss content based approaches and see that, depending on our problem, various classification or regression models can be used, ranging from very simple to much more complex models.
Models, bias and variance
Let’s focus a bit more on the main differences between the previously mentioned methods. More especially let’s see the implication that the modelling level has on the bias and the variance.
In memory based collaborative methods, no latent model is assumed. The algorithms directly works with the user-item interactions: for example, users are represented by their interactions with items and a nearest neighbours search on these representations is used to produce suggestions. As no latent model is assumed, these methods have theoretically a low bias but a high variance.
In model based collaborative methods, some latent interaction model is assumed. The model is trained to reconstruct user-item interactions values from its own representation of users and items. New suggestions can then be done based on this model. The users and items latent representations extracted by the model have a mathematical meaning that can be hard to interpret for a human being. As a (pretty free) model for user-item interactions is assumed, this methods has theoretically a higher bias but a lower variance than methods assuming no latent model.
Finally, in content based methods some latent interaction model is also assumed. However, here, the model is provided with content that define the representation of users and/or items: for example, users are represented by given features and we try to model for each item the kind of user profile that likes or not this item. Here, as for model based collaborative methods, a user-item interactions model is assumed. However, this model is more constrained (because representation of users and/or items are given) and, so, the method tends to have the highest bias but the lowest variance.
Summary of the different types of recommender systems algorithms.Memory based collaborative approaches
The main characteristics of user-user and item-item approaches it that they use only information from the user-item interaction matrix and they assume no model to produce new recommendations.
User-user
In order to make a new recommendation to a user, user-user method roughly tries to identify users with the most similar “interactions profile” (nearest neighbours) in order to suggest items that are the most popular among these neighbours (and that are “new” to our user). This method is said to be “user-centred” as it represent users based on their interactions with items and evaluate distances between users.
Assume that we want to make a recommendation for a given user. First, every user can be represented by its vector of interactions with the different items (“its line” in the interaction matrix). Then, we can compute some kind of “similarity” between our user of interest and every other users. That similarity measure is such that two users with similar interactions on the same items should be considered as being close. Once similarities to every users have been computed, we can keep the k-nearest-neighbours to our user and then suggest the most popular items among them (only looking at the items that our reference user has not interacted with yet).
Illustration of the user-user method. The same colour code will be used in the remaining of the post.Item-item
To make a new recommendation to a user, the idea of item-item method is to find items similar to the ones the user already “positively” interacted with. Two items are considered to be similar if most of the users that have interacted with both of them did it in a similar way. This method is said to be “item-centred” as it represent items based on interactions users had with them and evaluate distances between those items.
Assume that we want to make a recommendation for a given user. First, we consider the item this user liked the most and represent it (as all the other items) by its vector of interaction with every users (“its column” in the interaction matrix). Then, we can compute similarities between the “best item” and all the other items. Once the similarities have been computed, we can then keep the k-nearest-neighbours to the selected “best item” that are new to our user of interest and recommend these items.
Illustration of the item-item method.Comparing user-user and item-item
The user-user method is based on the search of similar users in terms of interactions with items. As, in general, every user have only interacted with a few items, it makes the method pretty sensitive to any recorded interactions (high variance). On the other hand, as the final recommendation is only based on interactions recorded for users similar to our user of interest, we obtain more personalized results (low bias).
Conversely, the item-item method is based on the search of similar items in terms of user-item interactions. As, in general, a lot of users have interacted with an item, the neighbourhood search is far less sensitive to single interactions (lower variance). As a counterpart, interactions coming from every kind of users (even users very different from our reference user) are then considered in the recommendation, making the method less personalised (more biased). Thus, this approach is less personalized than the user-user approach but more robust.

Model based collaborative approaches
Model based collaborative approaches only rely on user-item interactions information and assume a latent model supposed to explain these interactions. For example, matrix factorization algorithms consists in decomposing the huge and sparse user-item interaction matrix into a product of two smaller and dense matrices: a user-factor matrix (containing users representations) that multiplies a factor-item matrix (containing items representations).
Matrix factorization
The main assumption behind matrix factorization is that there exists a pretty low dimensional latent space of features in which we can represent both users and items and such that the interaction between a user and an item can be obtained by computing the dot product of corresponding dense vectors in that space.
For example, consider that we have a user-movie rating matrix. In order to model the interactions between users and movies, we can assume that:
there exists some features describing (and telling apart) pretty well movies.
these features can also be used to describe user preferences (high values for features the user likes, low values otherwise)
However we don’t want to give explicitly these features to our model (as it could be done for content based approaches that we will describe later). Instead, we prefer to let the system discover these useful features by itself and make its own representations of both users and items. As they are learned and not given, extracted features taken individually have a mathematical meaning but no intuitive interpretation (and, so, are difficult, if not impossible, to understand as human). However, it is not unusual to ends up having structures emerging from that type of algorithm being extremely close to intuitive decomposition that human could think about. Indeed, the consequence of such factorization is that close users in terms of preferences as well as close items in terms of characteristics ends up having close representations in the latent space.
Illustration of the matrix factorization method.
The main assumption behind matrix factorization is that there exists a pretty low dimensional latent space of features in which we can represent both users and items and such that the interaction between a user and an item can be obtained by computing the dot product of corresponding dense vectors in that space.
For example, consider that we have a user-movie rating matrix. In order to model the interactions between users and movies, we can assume that:
there exists some features describing (and telling apart) pretty well movies.
these features can also be used to describe user preferences (high values for features the user likes, low values otherwise)
However we don’t want to give explicitly these features to our model (as it could be done for content based approaches that we will describe later). Instead, we prefer to let the system discover these useful features by itself and make its own representations of both users and items. As they are learned and not given, extracted features taken individually have a mathematical meaning but no intuitive interpretation (and, so, are difficult, if not impossible, to understand as human). However, it is not unusual to ends up having structures emerging from that type of algorithm being extremely close to intuitive decomposition that human could think about. Indeed, the consequence of such factorization is that close users in terms of preferences as well as close items in terms of characteristics ends up having close representations in the latent space.
Illustration of the matrix factorization method.Content based approaches
In the previous two sections we mainly discussed user-user, item-item and matrix factorization approaches. These methods only consider the user-item interaction matrix and, so, belong to the collaborative filtering paradigm. Let’s now describe the content based paradigm.
Concept of content-based methods
In content based methods, the recommendation problem is casted into either a classification problem (predict if a user “likes” or not an item) or into a regression problem (predict the rating given by a user to an item). In both cases, we are going to set a model that will be based on the user and/or item features at our disposal (the “content” of our “content-based” method).
If we are working with items features, the method is then user-centred: modelling, optimizations and computations can be done “by user”. We then train one model by user based on items features that tries to answer the question “what is the probability for this user to like each item?” (or “what is the rate given by this user to each item?”, for regression). We can then attach a model to each user that is trained on its data: the model obtained is, so, more personalized than its item-centred counterpart as it only takes into account interactions from the considered user. However, most of the time a user has interacted with relatively few items and, so, the model we obtain is a far less robust than an item-centred one.
Illustration of the difference between item-centred and user-centred content based methods.
Metrics based evaluation
If our recommender system is based on a model that outputs numeric values such as ratings predictions or matching probabilities, we can assess the quality of these outputs in a very classical manner using an error measurement metric such as, for example, mean square error (MSE). In this case, the model is trained only on a part of the available interactions and is tested on the remaining ones.
Still if our recommender system is based on a model that predicts numeric values, we can also binarize these values with a classical thresholding approach (values above the threshold are positive and values bellow are negative) and evaluate the model in a more “classification way”. Indeed, as the dataset of user-item past interactions is also binary (or can be binarized by thresholding), we can then evaluate the accuracy (as well as the precision and the recall) of the binarized outputs of the model on a test dataset of interactions not used for training.
Finally, if we now consider a recommender system not based on numeric values and that only returns a list of recommendations (such as user-user or item-item that are based on a kNN approach), we can still define a precision like metric by estimating the proportion of recommended items that really suit our user. To estimate this precision, we can not take into account recommended items that our user has not interacted with and we should only consider items from the test dataset for which we have a user feedback.
Human based evaluation
When designing a recommender system, we can be interested not only to obtain model that produce recommendations we are very sure about but we can also expect some other good properties such as diversity and explainability of recommendations.
As mentioned in the collaborative section, we absolutely want to avoid having a user being stuck in what we called earlier an information confinement area. The notion of “serendipity” is often used to express the tendency a model has or not to create such a confinement area (diversity of recommendations). Serendipity, that can be estimated by computing the distance between recommended items, should not be too low as it would create confinement areas, but should also not be too high as it would mean that we do not take enough into account our users interests when making recommendations (exploration vs exploitation). Thus, in order to bring diversity in the suggested choices, we want to recommend items that both suit our user very well and that are not too similar from each others. For example, instead of recommending a user “Start Wars” 1, 2 and 3, it seems better to recommend “Star wars 1”, “Start trek into darkness” and “Indiana Jones and the raiders of the lost ark”: the two later may be seen by our system as having less chance to interest our user but recommending 3 items that look too similar is not a good option.
News Recommender Assignment
Problem Statement
You have been recruited as data scientists by a start-up, JhakaasNewsVala, based out of Mumbai.
The company is developing an app that promises to deliver a unique news experience to its app users. The company has identified it target market as working professionals in the age group 21-40. Recognizing the fact that retention (defined here as a visit after the first visit) is a huge issue for apps, they understand the need to make an impact on the first visit itself. The problem however is that they know nothing about the user interests or demographics at the time to personalize the news feed to them.
The company has acquired a corpus of news stories. The real estate available for providing news stories (the mobile phone’s screen) is limited and so without a scroll, only 10 stories can be displayed. Statistics show that the number of users scrolling beyond the first set of stories drops off very quickly unless a story on the first page catches the users eye (that is, results in a clickthrough).
You have been tasked with the job of building two intelligent bots.
The article recommender: This bot selects articles to serve a user. Inputs to the bot is the
corpus of new articles and a user profile if available.
The user profiler: Once the user starts consuming news stories, (s)he leaves behind a clickstream of the form below:

The bot must extract user interests from such data that can then be used for further personalization for (her)his news feed.
The ultimate objective is to increase clickthrough and the frequency with which the user opens the app to consume stories.
However, the objective in the first visit is to:
Reduce bias in data collection (Example Bias: Stories that get served often and ranked higher, have a higher likelihood of being consumed (obtaining a clickthrough))
Learn as much as possible about the users on their first visit
Maximize coverage of the news corpus
Having learned about the trade-off between Exploiting what you know and Exploring the space for what you don’t, how would you implement a strategy for the same within the project?
In the previous two sections we mainly discussed user-user, item-item and matrix factorization approaches. These methods only consider the user-item interaction matrix and, so, belong to the collaborative filtering paradigm. Let’s now describe the content based paradigm.
Concept of content-based methods
In content based methods, the recommendation problem is casted into either a classification problem (predict if a user “likes” or not an item) or into a regression problem (predict the rating given by a user to an item). In both cases, we are going to set a model that will be based on the user and/or item features at our disposal (the “content” of our “content-based” method).
If we are working with items features, the method is then user-centred: modelling, optimizations and computations can be done “by user”. We then train one model by user based on items features that tries to answer the question “what is the probability for this user to like each item?” (or “what is the rate given by this user to each item?”, for regression). We can then attach a model to each user that is trained on its data: the model obtained is, so, more personalized than its item-centred counterpart as it only takes into account interactions from the considered user. However, most of the time a user has interacted with relatively few items and, so, the model we obtain is a far less robust than an item-centred one.
Illustration of the difference between item-centred and user-centred content based methods.Metrics based evaluation
If our recommender system is based on a model that outputs numeric values such as ratings predictions or matching probabilities, we can assess the quality of these outputs in a very classical manner using an error measurement metric such as, for example, mean square error (MSE). In this case, the model is trained only on a part of the available interactions and is tested on the remaining ones.
Still if our recommender system is based on a model that predicts numeric values, we can also binarize these values with a classical thresholding approach (values above the threshold are positive and values bellow are negative) and evaluate the model in a more “classification way”. Indeed, as the dataset of user-item past interactions is also binary (or can be binarized by thresholding), we can then evaluate the accuracy (as well as the precision and the recall) of the binarized outputs of the model on a test dataset of interactions not used for training.
Finally, if we now consider a recommender system not based on numeric values and that only returns a list of recommendations (such as user-user or item-item that are based on a kNN approach), we can still define a precision like metric by estimating the proportion of recommended items that really suit our user. To estimate this precision, we can not take into account recommended items that our user has not interacted with and we should only consider items from the test dataset for which we have a user feedback.
Human based evaluation
When designing a recommender system, we can be interested not only to obtain model that produce recommendations we are very sure about but we can also expect some other good properties such as diversity and explainability of recommendations.
As mentioned in the collaborative section, we absolutely want to avoid having a user being stuck in what we called earlier an information confinement area. The notion of “serendipity” is often used to express the tendency a model has or not to create such a confinement area (diversity of recommendations). Serendipity, that can be estimated by computing the distance between recommended items, should not be too low as it would create confinement areas, but should also not be too high as it would mean that we do not take enough into account our users interests when making recommendations (exploration vs exploitation). Thus, in order to bring diversity in the suggested choices, we want to recommend items that both suit our user very well and that are not too similar from each others. For example, instead of recommending a user “Start Wars” 1, 2 and 3, it seems better to recommend “Star wars 1”, “Start trek into darkness” and “Indiana Jones and the raiders of the lost ark”: the two later may be seen by our system as having less chance to interest our user but recommending 3 items that look too similar is not a good option.
News Recommender Assignment
Problem Statement
You have been recruited as data scientists by a start-up, JhakaasNewsVala, based out of Mumbai.
The company is developing an app that promises to deliver a unique news experience to its app users. The company has identified it target market as working professionals in the age group 21-40. Recognizing the fact that retention (defined here as a visit after the first visit) is a huge issue for apps, they understand the need to make an impact on the first visit itself. The problem however is that they know nothing about the user interests or demographics at the time to personalize the news feed to them.
The company has acquired a corpus of news stories. The real estate available for providing news stories (the mobile phone’s screen) is limited and so without a scroll, only 10 stories can be displayed. Statistics show that the number of users scrolling beyond the first set of stories drops off very quickly unless a story on the first page catches the users eye (that is, results in a clickthrough).
You have been tasked with the job of building two intelligent bots.
The article recommender: This bot selects articles to serve a user. Inputs to the bot is the
corpus of new articles and a user profile if available.
The user profiler: Once the user starts consuming news stories, (s)he leaves behind a clickstream of the form below:

The bot must extract user interests from such data that can then be used for further personalization for (her)his news feed.
The ultimate objective is to increase clickthrough and the frequency with which the user opens the app to consume stories.
However, the objective in the first visit is to:
Reduce bias in data collection (Example Bias: Stories that get served often and ranked higher, have a higher likelihood of being consumed (obtaining a clickthrough))
Learn as much as possible about the users on their first visit
Maximize coverage of the news corpus
Having learned about the trade-off between Exploiting what you know and Exploring the space for what you don’t, how would you implement a strategy for the same within the project?
Comments
Post a Comment