Movie Recommendation System in R Jupyter Notebook
Movie Recommendation System in R Jupyter Notebook
abbreviate, write
as.matrix
as.dist, dist
as.matrix
dcast, melt
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 1/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
In [3]: summary(movie_data)
In [4]: head(movie_data)
A data.frame: 6 × 3
In [5]: summary(rating_data)
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 2/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
In [6]: head(rating_data)
A data.frame: 6 × 4
1 1 16 4.0 1217897793
2 1 24 1.5 1217895807
3 1 32 4.0 1217896246
4 1 47 4.0 1217896556
5 1 50 4.0 1217896523
Data Pre-processing
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 3/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 4/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
Film-
tion Adventure Animation Children Comedy Crime Documentary Drama Fantasy H
Noir
nt> <int> <int> <int> <int> <int> <int> <int> <int> <int>
0 1 1 1 1 0 0 0 1 0
0 1 0 1 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 5/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
$HYBRID_realRatingMatrix
'Hybrid recommender that aggegates several recommendation strategies using weighted
averages.'
$ALS_realRatingMatrix
'Recommender for explicit ratings based on latent factors, calculated by alternating least
squares algorithm.'
$ALS_implicit_realRatingMatrix
'Recommender for implicit data based on latent factors, calculated by alternating least
squares algorithm.'
$IBCF_realRatingMatrix
'Recommender based on item-based collaborative filtering.'
$LIBMF_realRatingMatrix
'Matrix factorization with LIBMF via package recosystem (https://cran.r-
project.org/web/packages/recosystem/vignettes/introduction.html).'
$POPULAR_realRatingMatrix
'Recommender based on item popularity.'
$RANDOM_realRatingMatrix
'Produce random recommendations (real ratings).'
$RERECOMMEND_realRatingMatrix
'Re-recommends highly rated items (real ratings).'
$SVD_realRatingMatrix
'Recommender based on SVD approximation with column-mean imputation.'
$SVDF_realRatingMatrix
'Recommender based on Funk SVD with gradient descend
(https://sifter.org/~simon/journal/20061211.html).'
$UBCF_realRatingMatrix
'Recommender based on user-based collaborative filtering.'
We will implement a single model in our R project – Item Based Collaborative Filtering.
In [13]: recommendation_model$IBCF_realRatingMatrix$parameters
$k
30
$method
'Cosine'
$normalize
'center'
$normalize_sim_matrix
FALSE
$alpha
0.5
$na_as_zero
FALSE
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 6/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
1 2 3 4
In the above matrix, each row and column represents a user. We have taken four
users and each cell in this matrix represents the similarity that is shared between the
two users.
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 7/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
1 2 3 4
Now, we will create a table of ratings that will display the most unique ratings.
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 8/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
rating_values
0 0.5 1 1.5 2 2.5 3 3.5 4
4.5
6791761 1198 3258 1567 7943 5484 21729 12237 28880
8187
5
14856
In [19]: library(ggplot2)
movie_views <- colCounts(ratingMatrix) # count views for each movie
table_views <- data.frame(movie = names(movie_views),
views = movie_views) # create dataframe of views
table_views <- table_views[order(table_views$views,
decreasing = TRUE), ] # sort by number of
table_views$title <- NA
for (index in 1:10325){
table_views[index,3] <- as.character(subset(movie_data,
movie_data$movieId == table_views[
}
table_views[1:6,]
A data.frame: 6 × 3
Now, we will visualize a bar plot for the total number of views of the top films. We will
carry this out using ggplot2.
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 9/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
From the above bar-plot, we observe that Pulp Fiction is the most-watched film
followed by Forrest Gump.
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 10/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
From the above output of ‘movie_ratings’, we observe that there are 420 users and
447 films as opposed to the previous 668 users and 10325 films. We can now
delineate our matrix of relevant users as follows –
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 11/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
Now, we will visualize the distribution of the average ratings per user.
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 12/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
Data Normalization
In [26]: normalized_ratings <- normalize(movie_ratings)
sum(rowMeans(normalized_ratings) > 0.00001)
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 13/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 14/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 15/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
$k
30
$method
'Cosine'
$normalize
'center'
$normalize_sim_matrix
FALSE
$alpha
0.5
$na_as_zero
FALSE
In [33]: class(recommen_model)
'Recommender'
'dgCMatrix'
In [35]: dim(model_info$sim)
447 · 447
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 16/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
sum_rows
30
447
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 17/18
11/18/23, 2:37 PM movie-recommendation-system-in-r - Jupyter Notebook
'Get Shorty (1995)' · 'Casper (1995)' · 'Ed Wood (1994)' · 'Quiz Show (1994)' ·
'Santa Clause, The (1994)' · 'What\'s Eating Gilbert Grape (1993)' · 'Dave (1993)' ·
'In the Line of Fire (1993)' · 'Beauty and the Beast (1991)' · 'Kingpin (1996)'
21 48516 1 913
300 50 21 68157
localhost:8888/notebooks/movie-recommendation-system-in-r.ipynb 18/18