Udacity IBM Recommendations Project

Aseem Narula
4 min readJul 6, 2021

My name is Aseem Narula, a RPA consultant (….in simple words we make software bots….) and a aspiring Data Scientist currently enrolled and pursuing my journey by learning with the Udacity Data Science Nanodegree.

This blog is all about explaining the exciting ML project on IBM Recommendations based on the the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles you think they will like. Below you can see an example of what the dashboard could look like displaying articles on the IBM Watson Platform.

This project will be divided into the following tasks-

I. Exploratory Data Analysis

Before making recommendations of any kind, we will need to explore the data we are working with for the project.

Here we are given two datasets, one is the user item interactions and other is the article community files in the csv format.

The dataset have the unique article id, followed by the title email for each article.

Our EDA shows us that the distribution of the user per articles interactions is varying between the range of 0 to ~5000.

Heatmap of the user interactions data frame

II. Rank Based Recommendations

To get started in building recommendations, you will first find the most popular articles simply based on the most interactions. Since there are no ratings for any of the articles, it is easy to assume the articles with the most interactions are the most popular. These are then the articles we might recommend to new users (or anyone depending on what we know about them).

Top Articles data frame is being created based on the ‘title’ count followed by sorting on the ‘user_id’ column, also the below screenshot shows the quick glimpse of the top articles.

III. User-User Based Collaborative Filtering

In order to build better recommendations for the users of IBM’s platform, we could look at users that are similar in terms of the items they have interacted with. These items could then be recommended to the similar users. This would be a step in the right direction towards more personal recommendations for the users.

I have created the find_similar_users that will work on the dot product of the user item and user_item.loc[user_id]

If we were given a new user, with the given scenario that for the new user where the rating is not readily available with us for making the recommendations, in cases where you are introduced to a new user or new movie, collaborative filtering is not helpful as a technique to make predictions. Cold Start Problem would be a better way to handle this scenario, where Knowledge Based or Content Based Recommendations are useful for the new users.

IV. Matrix Factorization

Finally, I have completed a machine learning approach to building recommendations. Using the user-item interactions, I have built out a matrix decomposition. Using decomposition, I got an idea of how well we can predict new articles an individual might interact with which is it isn’t great.

By looking at the Test vs Train Data, we can conclude that wit the increase in the number of latent features in both test and training data sets, the accuracy of the SVD model increases but it is not efficient way to predict the recommendations. In my earlier analysis, where we donot know if our recommendations are good enough or not, which is why we use Cold Start Problem to address this issue where FunkSVD is a ML model based recommendations for predicting the rating to the articles for the new users.

Acknowledgement

All the datasets of User-Item-Interactions and Articles Community used in this Data Science Project are provided through IBM in collaboration with the Udacity and are used for my project with Udacity Data Scientist Nanodegree.

--

--