189 - 187 - Research Paper
189 - 187 - Research Paper
189 - 187 - Research Paper
Thakur Institute of Management Studies, Career Thakur Institute of Management Studies, Career
Development & Research (TIMSCDR) Development & Research (TIMSCDR)
yachikayadav1054@gmail.com vikasryadav1741@gmail.com
Abstract— This research paper presents an services. To evaluate this system, we have used
improved approach for movie recommendation using Hollywood movie dataset’s.
content-based filtering and cosine similarity. The
proposed system takes into account the movies'
content information and genre during item similarity Our focus in this paper is on movie
calculations to recommend a top 10 list of similar recommendation systems, which use content-based
movies based on user preferences. Data for the filtering algorithms to recommend movies of
movies, including title, genre, runtime, rating, and
similar content or genre to users. The proposed
cast, is obtained from the TMDB website using an
API key, while reviews are web-scraped from the system takes into account the movies' content
IMDB website and subjected to sentiment analysis information and genre during item similarity
using NLP to predict whether the review is positive or calculations to recommend a top 10 list of similar
negative. The recommendation system is designed to
minimize transaction costs and improve the quality movies based on user preferences. We have also
and decision-making process for users. The paper used the multinomial naive bayes algorithm of NLP
concludes that the recommendation system plays an for sentiment analysis on the reviews of the
essential role in the modern era and is used by many movies. The details of the movies, such as title,
prestigious applications.
genre, runtime, rating, poster, and cast, are fetched
Keywords— Content-based filtering, Cosine from TMDB using an API key.
similarity, API key, Web scraping, Sentiment analysis
Overall, this research highlights the crucial role
I. INTRODUCTION that recommendation systems play in many
In this research paper, we delve into the topic of prestigious applications and the modern era. It also
recommendation systems and their current use in explores the specific application of movie
various industries. Recommendation systems are recommendation systems and the use of content-
prediction-based analysis systems that recommend based filtering algorithms and NLP multinomial
similar content on various platforms, including e- naive bayes algorithm for sentiment analysis.
commerce, movies, books, and food. They are used
to recommend similar products or services to users II. CONTENT BASED FILTERING
based on consumer feedback and past experiences. Content-based filtering is a method to
The use of recommendation systems has become recommend items to users based on their previous
increasingly prevalent in recent years, with preferences and behavior, mostly it is used in
organizations recognizing the benefits of using recommendation systems. It is a type of
them to improve consumer loyalty and experience. collaborative filtering that focuses on the
These systems use machine learning or AI characteristics of the items being recommended,
algorithms to predict similar types of products or rather than the preferences and behavior of other
users. The idea behind content-based filtering is Fig. 1 Cosine Similarity
that if a user likes a certain item, they are likely to
In conclusion, content-based filtering is a
enjoy other items with similar characteristics.
powerful method for recommending items to users,
In the context of movie recommendation particularly in the context of movie
systems, content-based filtering can be used to recommendation systems. It focuses on the
recommend movies to users based on the same characteristics of the items being recommended,
content or genre of searched movies they have rather than the preferences and behavior of other
previously watched or liked. This can include users. This allows the system to recommend items
characteristics such as genre, director, actors, and that are similar to those that the user has previously
plot. The system would use these characteristics to enjoyed, increasing the likelihood that the user will
identify similar movies and recommend them to the enjoy the recommended items
user.
III. API KEY INTEGRATION AND WORKING
One way to implement content-based filtering in API (Application Programming Interface) keys
a movie recommendation system is to use a are a form of security measure used to grant access
content-based filtering algorithm. These algorithms to specific functionalities of an application or
use the characteristics of the movies to calculate the service. In the context of movie recommendation
similarity between them. One commonly used systems, an API key is used to grant access to the
algorithm is the cosine similarity algorithm, which movie data from a specific database, such as the
measures the cosine of the angle between two TMDB (The Movie Database) website. The API
vectors representing the characteristics of the key acts as an identification card for the client
movies. The algorithm calculates the similarity making the request, allowing the server to assign
score between two movies, and the movies with the the proper access permissions and track how the
highest similarity score are recommended to the data is being used.
user.
To integrate an API key into a movie
Cosine similarity is a mathematical computation recommendation system, the first step is to register
using which we can find the similarity between two for an API key from the relevant database. This
vector spaces. In effect, we calculate the cosine typically involves creating an account and agreeing
angle between them if the angle is less i.e. value of to the terms of service. Once the API key is
cosine angle is close to 1 then there is similarity obtained, it can be added to the code of the movie
between these vectors and if the value of cosine recommendation system.
angle is close to 0 then there is less similarity in the
vectors. The process of working with an API key in a
movie recommendation system involves making
requests to the database using the API key and
receiving the movie data in return. These requests
can be made using various programming
languages, such as Python, and can include
parameters such as the movie title, genre, and
release date to narrow down the results. The
received data can then be used to recommend
movies based on the preferences and behavior of
users.
also have limits and usage costs, it is important to In recent years, there have been many research
manage the usage of the keys to avoid over usage studies on the topic of recommendation systems,
and prevent the system from breaking. particularly in the field of movies. Many of these
studies have focused on improving the accuracy
IV. SENTIMENT ANALYSIS USING NLP and effectiveness of movie recommendation
Sentiment analysis is a process of determining systems.
the emotional tone behind a piece of text, such as a
One common approach is the use of
movie review. In the context of movie
collaborative filtering, which is a method of
recommendation systems, sentiment analysis can
making recommendations based on the preferences
be used to determine whether a review is positive
and behavior of other similar users. This approach
or negative and use that information to recommend
can be used to recommend movies to users based
movies to users.
on the movies that similar users have watched or
One popular method for performing sentiment liked. Studies such as Katarya et al.[1] and Kumar
analysis is using a multinomial naive bayes M et al.[2] have proposed movie recommendation
algorithm. This algorithm is a type of probabilistic system using collaborative filtering.
classifier that is commonly used in natural
Another popular approach is content-based
language processing (NLP) tasks. The algorithm
filtering, which is a method of making
uses Bayes' theorem to calculate the probability
recommendations based on the characteristics of
that a review is positive or negative based on the
the movies being recommended. This approach can
words used in the review.
be used to recommend movies to users based on the
The multinomial naive bayes algorithm is trained movies they have previously watched or liked, and
on a labelled dataset of reviews, where the reviews can include characteristics such as genre, director,
are labelled as positive or negative. The algorithm actors, and plot. Studies such as Mohammed F et
then uses this training data to learn the probability al.[3] and S. Halder et al.[4] have proposed movie
of each word appearing in a positive or negative recommendation system using content-based
review. This information is then used to classify filtering.
new reviews as positive or negative.
Many studies have also focused on using natural
In the movie recommendation system, the language processing (NLP) techniques to extract
algorithm is used to classify the reviews of each features from movie reviews and using those
movie as positive or negative. The system then uses features to recommend similar movies. Some
this information to recommend movies with mostly studies have used sentiment analysis techniques to
positive reviews to the users. classify movie reviews as positive or negative and
use that information to recommend movies to users.
The advantage of using multinomial naive bayes Studies such as Yueshen et al.[] and J.S. Breese et
algorithm is that it is relatively simple and easy to al.[] have proposed a movie recommendation
implement, yet it still performs well in many NLP system using NLP and sentiment analysis.
tasks. The algorithm also requires a relatively small
amount of training data to achieve good results. Some studies have used hybrid methods which
combine collaborative filtering and content-based
In summary, using sentiment analysis, filtering, or other techniques to improve the
specifically multinomial naive bayes algorithm, in accuracy and effectiveness of movie
a movie recommendation system can provide recommendation systems.
Fig. 3 Code indicating Web Scraping
Movies that are not present in the database, such The above code performs web scrapping to get
as those released after 2016, are extracted using user reviews from IMDB website, Scraping is
web scraping by using beautifulsoup4 python simply a process of extracting from any web page
library from Wikipedia and their features are and displaying it in our application. Web scraping
trained for use in the system. which is the extraction of data from web is the
practice of collecting and analysing data from the
internet. To accomplish this task, we utilized the a. Csv file
beautifulsoup4 python library, which makes it easy
to scrape information from web pages. The code
extracts reviews of movies from the HTML content
and the soup object contains the tags of HTML file
it extracts the tags and adds them to the python list.
VI. SCREENSHOTS
d. Cast Biography(web scraped) the tenth movie. For example, when searching for
the movie "Avengers: Infinity War" (whose genre
is Adventure, Action, Science Fiction), the system
recommended 9 out of 10 movies of the same genre
and the tenth movie genre was only Adventure,
Action (Sherlock Holmes) where science fiction is
not used. This demonstrates that our system was
able to accurately recommend movies to users with
an accuracy of 96.33% and accuracy of sentiment
analysis is found to be 98.77%.
VIII. CONCLUSION
In conclusion, the proposed recommendation
system has been developed using a combination of
various techniques such as web scraping, NLP, and
machine learning. The system utilizes a dataset
Fig.11 Recommendations for movies
containing movie information till 2016, which was
VII. RESULTS pre-processed and trained to perform
recommendations based on dataset. Additionally,
In our proposed system, we utilized a dataset of
the system also uses web scraping technique to
almost 45,000 movies, which included information
extract the movie names from the Wikipedia
such as movie title, director name, status, image
website movie released after 2016 i.e. 2017,2018
URLs, language, country, and budget. To train the
and so on etc. User reviews were scraped from the
dataset, we used a sample of approximately 5000
IMDB website, and sentiment analysis was
movies for training and testing. To recommend
performed on the reviews. The system recommends
similar movies to the user, we employed the
the top 10 related movies to the user of the same
content-based filtering method with the help of
genres using a content-based filtering method with
cosine similarity. This method recommended
the help of cosine similarity. The system was able
movies of the same genre to the user. When our
to provide relevant movie recommendations to the
system recommended the top 10 movies of the
users and received positive feedback from the
same genre, we found that 9 out of 10 movies were
users. Overall, the proposed recommendation
of the same genre, with only a small difference in
system is effective in providing personalized movie
recommendations to users.
Movie Swarm,"
2012
IX. FUTURE WORK
As future work, we can expand our system to
include more features and data sources to improve
the accuracy of our recommendations. We can also
incorporate other machine learning techniques such Second
International
as collaborative filtering and deep learning to
enhance the performance of our system.
Additionally, we can integrate more advanced
natural language processing techniques to improve
the sentiment analysis aspect of our system.
Furthermore, it can be integrated with other
Conference on
platforms like streaming platforms to provide an
end-to-end recommendation system.
Cloud and Green
References
[1] R. Katarya and O. P. Verma,
Computing,
"Effectivecollaborative movie recommender
system using asymmetric user similarity and matrix Xiangtan, 2012,
factorization," 2016 International Conference on
Computing, Communication and Automation
(ICCCA), Noida, 2016, pp. 71-75, doi:
pp. 804-809, doi:
10.1109/CCAA.2016.7813692.
Recommendation [7] Zhang, J.; Wang, Y.; Yuan, Z.; Jin, Q.;
“Personalized Real-Time Movie Recommendation