0% found this document useful (0 votes)
172 views

Movies Recommendation System Using Cosine Similarity

A Recommendation System is a filtering program whose primary goal is to predict the “rating” or “preference” of a user towards a domain-specific item. In our case, this domain-specific item is a movie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views

Movies Recommendation System Using Cosine Similarity

A Recommendation System is a filtering program whose primary goal is to predict the “rating” or “preference” of a user towards a domain-specific item. In our case, this domain-specific item is a movie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Movies Recommendation System using Cosine


Similarity
Shubham Pawar, Pritesh Patne, Priya Ratanghayra, Simran Dadhich, Shree Jaswal
IT dept., SFIT, Mumbai, India

Abstract:- A Recommendation System is a filtering This system will also provide the user with sentiment
program whose primary goal is to predict the “rating” or analysis on the reviews of the movie.
“preference” of a user towards a domain-specific item. In
our case, this domain-specific item is a movie. Hence the These functions of this system will prove to be very
main focus of our recommendation system is to provide a useful to the user and consequently save a lot of time, which
total of ten movie recommendations to users who the user can invest in actually watching the movie he/she
searched for a movie that they like. These results are likes.
based on similar traits/demographics of the movie that
has been searched. Content based filtering is a technique II. LITERATURE REVIEW
that is used to recommend movies. Apart from providing By using graph databases, we can construct a data
recommendations the system also provides information model, it is simpler and more expressive to organize data
about the searched movie. The additional details include than to use it. No SQL database or traditional relational
the movie rating, its release date, cast, and genres. The database. Ningning Yi can model and manage data
system also provides additional information about the applications in a simple and intuitive manner, and it can also
cast. To help the user save time on reading reviews the make data units smaller and more standard[1]. It can also
system also performs sentiment analysis on the movie’s realize rich relational links.
reviews, grading them into two categories which are
’Good’ and ’Bad’. Ashrita Kashyap1 et al. introduced Movie REC, a
recommender system for movie recommendation, which used
I. INTRODUCTION Blender and CAD tools[2].
A recommendation system is a type of suggesting Meenu Gupta et al. used KNN algorithms and
system which makes suggestions based on the user’s liking. collaborative filtering in order to increase the accuracy of
These systems can be applied to various data. These systems results as compared to content-based filtering[3]. A
can retrieve and filter data based on users preferences to give collaborative filtering technique combines cosine similarity
suggestions or recommendations in the upcoming period. with the knearestneighbor approach, which alleviates many
To watch a movie the first step is to select a movie that of the drawbacks associated with content-based filtering.
matches the user’s liking. Users often waste a lot of time However, it cannot handle fresh items since it hasn’t seen
selecting a movie to watch. Here comes the need for a them during training.
recommendation system. It can recommend popular movies Rahul Katarya et al. [4] use a hybrid cluster and
based on their rating, but what makes the system useful is its optimization approach to improve movie prediction accuracy.
ability to recommend movies based on users’ liking and Such a hybrid approach has been used to overcome the
preferences. The purpose of this system is to search for limitations of typical content-based and collaborative
content that would be interesting to an individual. recommendation systems. For clustering, k-means algorithm
Since the number of users and the movies are increasing is applied and for optimization, cuckoo search optimization is
day by day, computing the recommended movies list in implemented.
asingle node machine takes a very large time. When we deal The Android application developed by Nimish Kapoor
with huge volumes of data coming from various sources and et al. displays multiple movie categories [5]. Users can add
in a variety of formats as we see in the case of movies where ratings, reviews, create a favorite list of movies, and watch
there is a huge amount of data to be computed and then movie trailers. The application’s main purpose is to rate
recommended to a user, it involves many aspects that have to movies based on the SVM model used to categorize the
be taken into consideration while recommending movies to ratings into positive and negative emotions.
the user.
Bagher Rahimpour Cami et al. propose a content-based
Our recommending system uses cosine similarity which movie recommendation system that predicts movie
is a type of content-based filtering method to recommend preferences based on temporal user preferences[6]. In the
similar movies to the user. Additional information about the proposed method, the content attributes of rated movies (for
searched movie will also be provided. The additional each user) are incorporated into a Dirichlet Process Mixture
information includes a Movie Poster, an Overview of the Model to infer user preferences and provide a proper
movie, a Rating of the movie, Genres, the Run time of the recommendation list.
movie, and its status which can either be released or
unreleased.

IJISRT22APR1053 www.ijisrt.com 342


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Mostafa Khalaji et al. designed a system that combines
collaborative filtering and content-based filtering to solve the
cold-start problem for new items[7]. HMRS-RA would
reduce the cold start problem for new movies by considering
contextual information such as genre. Utilizing clustering to
reduce the dimensionality of the data, the proposed method
solves the scalability problem.

Our lives are greatly improved by movie


recommendation systems because they reduce the amount of
time and effort required to determine the value of the film.
Nayan Verma et al. have used methods like swarm-based
collaborative filtering, KNN with S-BERT, and universal
sentence encoder[8]. This paper also includes how you can
handle challenges to systems. The results of the experiment
indicate that the system is effective at predicting highquality
films.

Hrisav Bhowmick et al. [9] have implemented eight


Fig. 1: Cosine Similarity
different methods for recommending movies. An example of
a genre-based recommendation technique was that movies The movies are recommended based on a simple
associated with a particular genre were checked first, then algorithm called Cosine Similarity. Cosine similarity is a
based on the scores, recommended. In genre based measure used to determine the similarity between two
recommendation, however, there remains a high chance that items[14]. Mathematically it can be determined as the cosine
the recommended movies may not be liked by the target user angle between two vectors in a three-dimensional plane. We
since the recommendation is based on only genre, not user can also check the Euclidean distance between the two
profile similarity. Using the Pearson Correlation Coefficient vectors to determine how different or similar they are from
Based recommended system the similarity between users can each other. In our case, one of the vectors is the movie that is
be easily determined, but it is a long formula-based method searched and the rest of the movies in the database are
that requires a lot of computation and memory. checked as the second vector. The top ten movies which have
the least Euclidean distance corresponding to the searched
Data Collection is a technique that Parth Kotak et al.
movie are shown as recommendations.
developed for filtering a data base of movies[10]. It collects
ratings from the user and then pre-processes it. The next step Cosine Similarity is a type of Content-based filtering
is to clean the data, then train the machine learning model, approach. It is one of the most popular techniques used in
and finally generate predictions. The user enters a movie recommendation systems. The attributes of a thing are termed
name and the year in the search bar, and the program as ”content”. Based on these attributes we are able to classify
recommends four movies based on the likability and user whether the two things are similar or not. The attributes can
ratings of each movie in that particular year. With a better be words specified in the database such as genre, cast names,
data set, the model becomes more accurate. director names, description, and so on. If the attributes match
or have a high similarity then the two movies can be
III. METHODOLOGY
classified as similar movies. The intuition behind this sort of
The project aims to build a platform that will recommendation system is that if a user liked a particular
recommend movies to users, provide a detailed description of movie or show, he/she might like a movie or a show similar
the searched movies and perform sentiment analysis on the to it.
movie reviews. The information provided will surely cut
down the time spent in selecting a movie to watch.

The main purpose to develop a movies recommendation


system is to provide users with recommendations that are not
based on popularity or purely rating but based on the movies
that the user likes. This will lead to a highlypersonalized
recommendation, which will increase the accuracy of the
recommendation system. The sentiment analysis and the
additional information of the searched movie will help the
user make an informed decision while selecting a movie. The
user won’t have to surf the internet for finding a movie that
he/she likes as all the information needed will be provided on
a single platform. The user won’t have to rely on friends for a
movie suggestion as the recommendation system will provide
the user with the top ten movies that are most similar to the Fig. 2. Content Based Filtering
searched movie.

IJISRT22APR1053 www.ijisrt.com 343


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IV. WORKING weightings. Multinomial Naive Bayes algorithm is used for
classification and analysis of the data which is then divided
The proposed solution mainly uses python to work on into training and testing data in the ratio of 4:1.
data sets and to apply various Machine Learning algorithms  Step 5: Creating files for Web pages
to get the desired output. AJAX, HTML, and JSON are also For the system to be useful and easy to use, the GUI mustbe
used to create dynamic web pages and easy-to-understand good. This would help the user to communicate with the
GUI for a better user experience. software. Two main HTML pages are created, one for when
the user searches a movie and then another for when the
The Implementation mainly consists of six steps. They are as
user gets all the details of the movie, gets a sentiment
follows:
analysis of the movie reviews, and gets recommended the
 Step 1:Finding and loading suitable data top ten movies. AJAX is used to get data from the server,
Appropriate data sets are shortlisted and downloaded from and JSON is a data format that is used to send data to the
Kaggle[12]. To keep the data up to date, the data for the server.
previous three years is fetched from Wikipedia. The
 Step6: Applying ML Algorithms for Recommendation
Reviews of various movies are fetched for IMDb (Internet
Count Vectorizer function is applied on the main data set to
Movie Database) to perform Sentiment Analysis[13].
form a count matrix. The Count Vectorizer function is used
Additionally TMDB (The Movie Database) API is used to
to transform the data into a vector based on the
fetch other data and images cast and movie posters[14].
frequency(count) of each word that occurs in the data set.
 Step 2:Data Cleaning Then Cosine Similarity is performed on those vectors to
The data sets were taken from Kaggle and the data fetched find the Euclidean Distance to recommend the top ten
from Wikipedia, both are processed in Jupyter Notebooks movies which are most similar to the searched movie.
to clean the data. The cleaned data is then loaded into the
main CSV file which will be used whenever the data needs V. RESULTS
to be accessed.
 Step 3:Creating an API key The Movies recommendation system that is created is
API stands for “Application Programming Interface.” An very user-friendly and easy to use. When the website is
API is a software intermediary that allows two applications loaded, the user can view a screen in which he can enter a
to talk to each other. movie name to get its detail and recommendations. When the
In other words, an API is a messenger that delivers your movie name is typed, a suggest drop-down list appears to
request to the provider that you are requesting it from and perform auto-complete. When a movie is selected, the enter
then delivers the response back to you. In our case the button gets enabled.
provider is TMDB. TMDB has a huge collection of movies
data, from which the system can fetch the information that After clicking on the enter button a new page is loaded,
it needs. To use TMDB API, an API key has to be which is made up of four sections. Movie Details, Top Cast,
generated after creating an account on TMDB. Sentiment Analysis on movie reviews, and the top ten
 Step 4:Performing Sentiment Analysis recommendations.
NLTK (Natural Language Toolkit) library is imported in
After training and testing the data in the ratio of 4:1,
python to perform various functions on the reviews data.
Multinomial Naive Bayes algorithm is used to perform
NLTK corpus is imported to go through all kinds of Natural
Sentiment Analysis. The accuracy of Sentiment Analysis
Language data sets. The Tfidf vectorizer will tokenize the
results in 98.77167 percent.
data, learn the vocabulary and inverse document frequency

Fig. 3: Home Page

IJISRT22APR1053 www.ijisrt.com 344


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig. 4: Movie Details

Fig. 5: Top Cast

Fig. 6: Recommendations

VI. CONCLUSION REFERENCES

When the user searches for a movie that he/she has [1.] Implementation of Movie Recommender System Based
already watched the Movies Recommendation System will on Graph Database [2017] ;Ningning Yi ; School of
recommend the top ten movies that are most similar to the Computer Science Communication University of China
searched movie. Moreover, the system will show additional Beijing, China
details of the movie and provide sentiment analysis on the [2.] Movie Recommender System: MOVREC using
reviews of that movie. All these features will save user’s time Machine Learning Techniques (2020) Ashrita Kashyap1
which otherwise would have been wasted on finding a movie , Sunita. B2 ,Sneh Srivastava3 , Aishwarya. PH4 , Anup
that he/she may or may not like. Jung Shah5 Department of Computer Science
Engineering SAIT, Bengaluru, Karnataka, India.
Every month several movies are released, the movies [3.] Movie Recommender System Using Collaborative
database only gets bigger and bigger. This would help the Filtering; Meenu Gupta; Aditya Thakkar ; Aashish ;
system to provide a more accurate recommendation to the Vishal Gupta ; Dhruv Pratap Singh Rathore Department
user and in turn increase customer satisfaction. of Computer Science Engineering Chandigarh
University, Punjab (2020).

IJISRT22APR1053 www.ijisrt.com 345


Volume 7, Issue 4, April – 2022 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[4.] An effective collaborative movie recommender system
with cuckoo search[2017] ; Rahul Katarya ; Om
Prakash Verma ; Department of Computer Science
Engineering, Delhi Technological University, Delhi,
India.
[5.] Movie Recommendation System Using NLP Tools
[2020] Nimish Kapoor; Saurav Vishal; Krishnaveni K
S; Department of Computer Science and Engineering,
Amrita School of Engineering, Amritapuri, Amrita
Vishwa Vidyapeetham, Amrita University, India.
[6.] A Content-based based on Temporal Movie User
Recommender Preferences System [2017] ; Bagher
Rahimpour Cami ;Hamid
[7.] Hassanpour ;HodaMashayekhi Faculty of Computer
Engineering IT Shahrood University of Technology
Shahrood, Iran
[8.] Hybrid Movie Recommder System based on Resource
Allocation [2020] ; Mostafa Khalaji ; Chitra Dadkhah ;
JoobinGharibshah
[9.] ; Faculty of Computer Engineering , K.N. Toosi
University of Technology , Tehran ,Iran
[10.] Movie Recommeder System using critic consensus
[2020] ; A Nayan Verma ; KedareshPetluri ;
Department of CSE , PES University , Banglore, India
[11.] Comprehensive Movie Recommdation System [2020];
HrisavBhowmick ; Ananda Chatterjee ; Jaydip Sen ;
Dept. Of Data Science , Praxis Business School ,
Kolkata , India
[12.] Movies Recommendation System using Filtering
Approach [2021] ;Parthkotak ; Prem Kotak ;
Department of Computer Engineering
[13.] , Vidyalankar Institute of Technology , Mumbai , India
[14.] https://www.kaggle.com/rounakbanik/the-movies-
dataset
[15.] https://www.imdb.com
[16.] https://www.themoviedb.org/login
[17.] https://www.machinelearningplus.com/nlp/cosine-
similarity

IJISRT22APR1053 www.ijisrt.com 346

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy