MovieRecomendation

Naive Bayes Classifier based Movie
Recommendation System
Dr. Yeresime Suresh1 , Rohit Kumar BR2 , Jahnavi Reddy C3 , P Sai Rohini4 , Sharath K5
1 Associate Professor, 2,3,4,5 Final Year Students
Department of Computer Science and Engineering,
Ballari Institute of Technology & Management, Ballari - 583 104, India.
{1 suresh.vec04, 2 bpscrohit,3 jahnavireddy2891, 4 prohini2906, 5 sharathkori0526}@gmail.com
Abstract—A search engine is a customized method of locat- II. Literature Review

ing the necessary material on the internet. The search engine
used by the movie suggestion system was created especially
This section highlights on the literature review carried
for Hollywood productions. Additionally, it gives the user out for movie recommendation systems. Generally there
a foundational understanding of how movie evaluations are are three different types of recommendation systems [1].
organized and recommended. It modifies the current method The categorization of the systems are as shown in Figure
by employing content-based screening. The viewer is given 1.
recommendations for comparable movies based on criteria
like the movie’s title, director, top three actors (cast), and
genre after a cosine-similarity analysis has determined how
similar the movies are. A multi-nomial naive bayes probabilistic
classifier, a categorization method, is used to categorize movie
reviews. The information from the movie DB is gathered using
the TMDB3 API. AJAX is used in the suggested method to
guess movie names. This work deals with providing the movie
details, reviews available on the internet, and its sentimental
analysis and then recommending the movie(s) to the user based
on similarity score.
Index Terms—Accuracy, cosine, confusion matrix, docu-
ment, similarity
I. Introduction
In the information age of today, almost all material
is accessible online. All we have to do is conduct an
online inquiry. Internet users spend more time looking up Fig. 1. Categorization of recommendation systems.
information about movies. Therefore, we reasoned that
it would be easier to make all the necessary information Lina Chen et al. (2017), in their study makes use of
available on a single website, which would enable users collaborative filtering ideas. It concentrates on how to
to find movies more rapidly. This was a key factor offer an effective and efficient programme for moving
in our decision to launch this initiative. In daily life, images. The suggested approach handles huge datasets
recommendation algorithms are more important. Due to using the Java MapReduce Framework. Highly efficient
many works in our life, people are constantly pressed and dependability can be attained for the suggested model
for time. Therefore, the suggestion systems are crucial by utilising the MapReduce architecture [2].
because they recommend without too much human effort. Unnathi et al. (2018) proposed collaborative filtering
A recommendation system essentially seeks out material with Apache Mahout. The user’s tastes and actions are
that would be interesting for a particular person. These taken into consideration in the collaborative filtering
recommendation engines saves time for people in taking method. On the basis of user similarities, predictions
decisions. These recommendation systems use artificial are made. Apache Mahout is used to integrate and use
intelligence, where they can predict what a person wants machine learning tools. The combined use of Apache
to view. Mahout and collaborative filtration is required throughout
The article has different sections. Section II provides the entire system [3].
details about state of the art literature review carried out. Shreya Agarwal et al., recommended that the techniques
Section III briefs about the proposed approach for movie of the past are now obsolete. So, the most recent iteration
recommendation to the user. Section IV provides insight of the Hybrid technique was created with the express
to the obtained results and the analysis part. Section V purpose of raising the standard of the movie recom-
concludes the work with scope for future work. mendation system. The collaborative filtering technique
and content-based filtering both have advantages and • Step 2: Extracting Movie Details
disadvantages that the suggested system incorporates. By The system extracts the basic information about the
utilizing one another’s good traits, negative characteristics movie and its cast
can be surmounted. Support Vector Machine method is • Step 3: Recommending similar movies.
used as its predictor in the implementation process. The The system recommends similar movies to the user,
combined approach has allowed for the identification of a based on cosine similarity.
positive increase in the overall system’s performance [4]. • Step 4: Sentimental Analysis of reviews
Mihhail Matskin et al.(2016) proposed a specialized In this step, the system scrapes the content from
recommender system for a movie website. The textual the Internet and does sentimental analysis of reviews
meta data is collected and analyzed, and it is discovered using naïve bayes classification algorithm. (by parsing
that they are unique and varied, which sets them apart the HTML web pages).
from other movie recommender systems. Following analy-
sis, resemblance is found and pictures are suggested. This
recommended model also suggests an extra feature for
adjusting weight in the textual information to discover
similarity [5].
Jeffrey Lund, yiu-Kai N G suggested that the system
with a combination of collaborative filtration and en-
coders. This method trains the suggested model using
the Movie Lens data set. Based on ratings from various
users, movie ratings for a specific person are anticipated.
The suggested system employs Collaborative Filtering as
well as a neural network model to suggest videos to the
user. The method also employs regularisation to lower
suggestion error rates [6].
Vallari Manaci et al.(2020) proposed the hybrid tech-
nique, which raises the system’s quality, is the primary
idea at play in this suggestion system. A hybrid strategy
combines a content-based approach and a joint approach. Fig. 2. Use case diagram for proposed system functionalities
It starts with a single hot encoding and then generates a
similarity matrix. The produced matrix is then subjected Figure 3 depicts the flow chart for the proposed movie
to the Deep Neural Network with the SoftMax Activation recommendation system based on content filtering.
function to produce the suggested selection of films [7]. Pre-Processing:
Patrick Adolf et al.(2019) suggested a study, that The selected data set cannot be used immediately
examines the operation of the text-based categorization for comparable calculations. To do this, we format the
method. Analyzing the effectiveness and applications of dataset in a manner that makes it possible to quickly
different classification and regression methods is helpful. determine the similarity value. We explain the idea of
It provides a short explanation of how the programme ranking. Finding the most pertinent paper is made easier
operates in various situations [8]. by ranking.
Angshuman Paul et al.(2018) proposed an approach Term Frequency - Inverse Document Frequency (TF-
that conducts an advanced study on the Random Forest IDF)
algorithm, explaining how performance is enhanced and The mathematical tool TF-IDF assesses a word’s rele-
assessed based on internal parameters. It helps to enhance vance to a document within a group of documents. It is
classification performance by reducing the number of trees utilized for information extraction and document search.
and repeatedly removing undesirable features [9]. It counts the number of times a word appears in the
III. Proposed Methodology documents. Most frequently used terms like if, where, and,
etc are given least priority because they don’t matter a lot
The proposed system mainly concentrates on the fol- to that particular document. The easiest way to calculate
lowing features. frequency is to count the total number of times a word
• Providing basic information about the movie appears in the given document.
• Sentimental analysis of the movies.
• Recommending similar kind of movies term i frequency in document j
T F (i, j) = (1)
Figure 2 represents of the functionalities of the proposed Total words in document j
system, through a systematic Use Case diagram. Equation 1 gives us a count of how many times a word
• Step 1: Input Movie Name has been repeated in a document. More frequent words
The system lets the user input movie name. have values that are closer to 0. By considering, all the
documents. Cosine similarity is used in our recommenda-
tion system to find and filter similar movies based on the
users interest.
Fig. 6. Cosine similarity angle representation
The angle between two vectors determines its direction

and is measured in ‘θ’. This angle can be calculated by
using Equation 2.
• When θ = 0 °, the ‘x’ and ‘y’ vectors overlap and
prove to be similar.
• When θ = 90 °, the ‘x’ and ‘y’ vectors are therefore
dissimilar.
A graphical representation of cosine similarity between
two vectors is as shown in Figure 6. The cosine similarity
values ranges between zero and one. If the angle is less,
greater the similarity and if the angle is more, lesser
the similarity. The cosine relationship increases with
decreasing angle. The cosine similarity across all numbers,
that is formed by two vectors is determined using Equation
2.
x·y
cos(x, y) = (2)
|x||y|
B. Movie Reviews:
Analyzing movie evaluations with sentiment analysis
based on naive Bayes categorization is a common tech-
nique. It is a probabilistic algorithm called Naive Bayes
determines the likelihood that each phrase in a review falls
Fig. 3. Flow chart for movie recommendation system under a specific mood group (positive or negative). Figure
5, shows sample reviews obtained through sentimental
analysis. The algorithm then classifies the evaluation
documents, splitting total words in that document, and as favorable or negative using these odds. The Mean
then computing the logarithm, TF-IDF can be computed. Square Error (MSE) is a parameter, used to address issues
Consequently, if the word is repeated many number of with the data’s branching patterns from each component.
times, the value will be closer to 0, else it will be closer Equation 3 represents the MSE formula used in our
to 1. approach. MSE value obtained for the proposed model
A. Recommending Similar movies: is 0.0072.
Figure 4 represents the sequence diagram for the pro-
posed system based on the use case diagram (Figure 2). ∑
D
M SE = (1/N ) (xi − yi )2 (3)
Cosine similarity is used to find similarity between two
i=1
Fig. 4. Sequence diagram for movie recommendation system
Fig. 5. Sentimental analysis of user reviews.

where n indicates the number of points, xi denotes value 3) Accuracy: When assessing the general accuracy of
returned by the model and yi is the actual value for the a classifier’s forecasts, such as those made by a movie
data point ‘i’. suggestion system, accuracy is a performance measure that
is used. It calculates the system’s percentage of accurate
IV. Results and Analysis
forecasts among all the occurrences in the data set, using
Confusion Matrix: It would be necessary to establish Equation 6.
the categorization categories and evaluate how well the
recommendation system performed in relation to these TP + TN
Accuracy = (6)
categories before creating a confusion matrix for a movie TP + FP + FN + TN
recommendation system. Accuracy for the proposed system is 98.77%. i.e., ( 56600
Assume we have two classification categories for movie + 80100)/(56600 + 1400 + 300 + 80100) = 136700 /
recommendations: “Recommended” and “Not Recom- 138400 = 98.77%
mended”. The confusion matrix in our approach is rep- 4) F1 Score: A performance measure called the F1 score
resented as shown in Table I. is used to assess how well a classifier balances memory and
accuracy. It is the harmonic mean of recall and precision
TABLE I
Confusion matrix for movie recommendation system and assigns each measure identical weight. Equation 7
represents F1 score.
Recommended Not - Recom-
(Predicted) mended (Pre- precision ∗ recall
dicted) F1 Score = (7)
Recommended (Actual) True Positive False Posi- precision + recall
(TP): The tivev(FP):
user was rec- The user
where Precision is the ratio of true positives to all
ommended was recom- positive forecasts, and Recall is the ratio of true positives
a movie and mended a to all positive occurrences in the data set.
liked it. movie and
did not like A score of 1 indicates flawless accuracy and memory,
it. while a score of 0 shows no precision or recall. The F1 score
Not - Recommended (Actual) False True goes from 0 to 1. Having a high F1 number means the
Negative Negative
(FN): The (TN): The
classifier is producing reliable favorable forecasts, while
user was user was minimizing false negatives and false positives. Table II
not recom- not recom- represents the confusion matrix values for the trained
mended a mended a
movie and movie and
model and results and the values of the performance
would have did not like evaluation parameters are tabulated in Table III.
liked it. it.
TABLE II
Confusion matrix value for movie recommendation system
A. Performance Evaluation Parameters
Recommended Not - Recom-
This subsection deals with various performance eval- (Predicted) mended (Pre-
uation parameters considered for evaluating the trained dicted)
models against a public domain data set. Recommended (Actual) 56600 1400
1) Precision: A performance measure called precision is Not - Recommended (Actual) 300. 80100
used to assess a classifier’s efficiency. It is the proportion
of the system’s overall number of accurate positive fore- Table III, lists out the values for various performance
casts to all of its positive predictions i.e., Proportion of evaluation parameters used in this approach to validate,
movie recommendations that are good recommendations. as well test the model. From the parameters such as
Equation 4 represents precision. Precision, Recall, F1 Score and Accuracy it can be found
that the proposed approach performs better than the
TP
P recision = (4) existing systems. Table IV shows the sample of test cases
TP + FP written for the proposed system.
2) Recall: Recall is a performance indicator that mea-
sures the proportion of accurate positive predictions to all TABLE III
positive occurrences in the data set, and it is used to assess Performance evaluation & comparison with existing approach
how comprehensive a classifier’s predictions are. Equation Existing Proposed
5 is the mathematical representation for recall, and is Performance Parameter
Approach Approach
defined as proportion of good movie recommendations that Precision 98.28 Average High
Recall 99.62 Average High
appear in top recommendations. Accuracy 98.77 Low High
F1 Score 98.94 Average High
TP
Recall = (5)
TP + FN
TABLE IV and also hybrid approach using optimization techniques
Test cases for the proposed approach can also be applied.
Test Description Action Result References
Case ID
1 Try to enter special Not able to enter Pass [1] Sameer Jain, “Movie Recommendation Sys-
characters in movie tem using Machine Learning”, Shiksha Online,
name https://www.shiksha.com/online-courses/articles/
2 Try to click on Not able to search Pass movie-recommendation-system-using-machine-learning/,
search with empty with empty name 20 March 2023.
movie name [2] Lina Chen, Tianqi Zhou, Jian Shen, “Movie Recommendation
3 Search movie that Displays “Movie Pass System Employing the User-Based CF in Cloud Computing”, in
is not present in not available proceedings of 2017 IEEE International Conference on Compu-
that data base message” tational Science and Engineering (CSE) and IEEE International
4 Type wrong movie Displays “Check Pass Conference on Embedded and Ubiquitous Computing (EUC),
spelling movie spelling” pp. 46-50, 21-24 July 2017.
[3] Unnathi Bhandary, Deepti Garg, Ching-Seh (Mike) Wu, “Movie
Recommendation Using Collaborative Filtering”, in proceed-
ings of 2018 IEEE 9th International Conference on Software
V. Conclusion Engineering and Service Science (ICSESS), pp.11-15, 23-25
November 2018 .
The user’s preference is the most crucial thing to [4] Shreya Agrawal, Pooja Jain, “An Improved Approach for Movie
remember. It is tedious to have to take the time to look Recommendation System”, in proceedings of 2017 International
for every bit of information on the internet. With the Conference on I-SMAC (IoT in Social, Mobile, Analytics and
Cloud) (I-SMAC), pp. 336-342, 10-11 February 2017.
help of this method, users can view useful details about [5] Mihhail Matskin, Chang Gao, “Content-Based Recommenda-
a movie all at once. The search engine, recommendation System for Movie Webiste”, Master Thesis, 2016.
[6] Jeffrey Lund, yiu-Kai N G, Dept. of Computer Science, “Movie
tion engine, and categorization are at the heart of this Recommendation using Deep Learning Approach”, in proceed-
endeavor. Metadata like cast, category, story line, year, ings of 2018 IEEE International Conference on Information
and performers are used by the system. Our system can Reuse and Integration for Data Science, pp. 47 - 54, 6-9 July
2018.
provide knowledge about movies, but it is only available [7] Vallari Manaci, Anjali Diwate, Priyanka Korade, Anita Seanthi,
for Hollywood productions. Additionally, based on user “MoView Engine: An Open-Source Movie Recommender”, in
search history, it is capable of suggesting related movies. proceedings of International Conference on Automation, Com-
puting and Communication, pp. 1-6, 29 July 2020 (ICACC-
After performing a number of analyses and tests, it was 2020).
discovered that most people save time by using movie [8] Patrick Adolf Telnoni, Reza Budiaawan, Mutia Qana’a, “Com-
suggestion websites rather than searching the internet for parison of Machine Learning Classification Method on Text-
based Case in Twitter”, in proceedings of International Confer-
every single piece of information about a Hollywood film. ence on ICT for Smart Society (ICISS), pp. 1-5, 19-20 November
From the obtained results, it can be concluded that the 2020.
proposed approach performs better when compared to [9] Angshuman Paul, Dipti Prasad Mukherjee, Prasun Das, Ad-
hinandhan Gangopadhyay, Appa Rao Chintha and Saurach
existing systems. In future, the work can be extended for Kundu, “Improved Random Forest Classifier”, IEEE Transac-
regional languages in India to overcome linguistic barrier, tions on Image Processing, pp. 1-13, vol. 27, no. 8, August 2018.

MovieRecomendation

Uploaded by

Copyright:

Available Formats

MovieRecomendation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MovieRecomendation

Uploaded by

Copyright:

Available Formats

Naive Bayes Classifier based Movie

Abstract—A search engine is a customized method of locat- II. Literature Review

Fig. 6. Cosine similarity angle representation

The angle between two vectors determines its direction

Fig. 5. Sentimental analysis of user reviews.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.