189 - 187 - Research Paper

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

A Machine Learning Approach to

Personalized Movie Recommendations using


Content-based filtering, Web Scraping and
NLP based sentiment analysis
Yachika S Yadav Vikas R Yadav

Research Scholar, MCA Research Scholar, MCA

Thakur Institute of Management Studies, Career Thakur Institute of Management Studies, Career
Development & Research (TIMSCDR) Development & Research (TIMSCDR)

Mumbai, India Mumbai, India

yachikayadav1054@gmail.com vikasryadav1741@gmail.com

Abstract— This research paper presents an services. To evaluate this system, we have used
improved approach for movie recommendation using Hollywood movie dataset’s.
content-based filtering and cosine similarity. The
proposed system takes into account the movies'
content information and genre during item similarity Our focus in this paper is on movie
calculations to recommend a top 10 list of similar recommendation systems, which use content-based
movies based on user preferences. Data for the filtering algorithms to recommend movies of
movies, including title, genre, runtime, rating, and
similar content or genre to users. The proposed
cast, is obtained from the TMDB website using an
API key, while reviews are web-scraped from the system takes into account the movies' content
IMDB website and subjected to sentiment analysis information and genre during item similarity
using NLP to predict whether the review is positive or calculations to recommend a top 10 list of similar
negative. The recommendation system is designed to
minimize transaction costs and improve the quality movies based on user preferences. We have also
and decision-making process for users. The paper used the multinomial naive bayes algorithm of NLP
concludes that the recommendation system plays an for sentiment analysis on the reviews of the
essential role in the modern era and is used by many movies. The details of the movies, such as title,
prestigious applications.
genre, runtime, rating, poster, and cast, are fetched
Keywords— Content-based filtering, Cosine from TMDB using an API key.
similarity, API key, Web scraping, Sentiment analysis
Overall, this research highlights the crucial role
I. INTRODUCTION that recommendation systems play in many
In this research paper, we delve into the topic of prestigious applications and the modern era. It also
recommendation systems and their current use in explores the specific application of movie
various industries. Recommendation systems are recommendation systems and the use of content-
prediction-based analysis systems that recommend based filtering algorithms and NLP multinomial
similar content on various platforms, including e- naive bayes algorithm for sentiment analysis.
commerce, movies, books, and food. They are used
to recommend similar products or services to users II. CONTENT BASED FILTERING
based on consumer feedback and past experiences. Content-based filtering is a method to
The use of recommendation systems has become recommend items to users based on their previous
increasingly prevalent in recent years, with preferences and behavior, mostly it is used in
organizations recognizing the benefits of using recommendation systems. It is a type of
them to improve consumer loyalty and experience. collaborative filtering that focuses on the
These systems use machine learning or AI characteristics of the items being recommended,
algorithms to predict similar types of products or rather than the preferences and behavior of other
users. The idea behind content-based filtering is Fig. 1 Cosine Similarity
that if a user likes a certain item, they are likely to
In conclusion, content-based filtering is a
enjoy other items with similar characteristics.
powerful method for recommending items to users,
In the context of movie recommendation particularly in the context of movie
systems, content-based filtering can be used to recommendation systems. It focuses on the
recommend movies to users based on the same characteristics of the items being recommended,
content or genre of searched movies they have rather than the preferences and behavior of other
previously watched or liked. This can include users. This allows the system to recommend items
characteristics such as genre, director, actors, and that are similar to those that the user has previously
plot. The system would use these characteristics to enjoyed, increasing the likelihood that the user will
identify similar movies and recommend them to the enjoy the recommended items
user.
III. API KEY INTEGRATION AND WORKING
One way to implement content-based filtering in API (Application Programming Interface) keys
a movie recommendation system is to use a are a form of security measure used to grant access
content-based filtering algorithm. These algorithms to specific functionalities of an application or
use the characteristics of the movies to calculate the service. In the context of movie recommendation
similarity between them. One commonly used systems, an API key is used to grant access to the
algorithm is the cosine similarity algorithm, which movie data from a specific database, such as the
measures the cosine of the angle between two TMDB (The Movie Database) website. The API
vectors representing the characteristics of the key acts as an identification card for the client
movies. The algorithm calculates the similarity making the request, allowing the server to assign
score between two movies, and the movies with the the proper access permissions and track how the
highest similarity score are recommended to the data is being used.
user.
To integrate an API key into a movie
Cosine similarity is a mathematical computation recommendation system, the first step is to register
using which we can find the similarity between two for an API key from the relevant database. This
vector spaces. In effect, we calculate the cosine typically involves creating an account and agreeing
angle between them if the angle is less i.e. value of to the terms of service. Once the API key is
cosine angle is close to 1 then there is similarity obtained, it can be added to the code of the movie
between these vectors and if the value of cosine recommendation system.
angle is close to 0 then there is less similarity in the
vectors. The process of working with an API key in a
movie recommendation system involves making
requests to the database using the API key and
receiving the movie data in return. These requests
can be made using various programming
languages, such as Python, and can include
parameters such as the movie title, genre, and
release date to narrow down the results. The
received data can then be used to recommend
movies based on the preferences and behavior of
users.

In addition to providing movie data, the API key


can also be used to access other information such
as reviews, posters, trailers and more. This can be
beneficial in the sentiment analysis process and to
provide a more complete information about the
movie.
API keys are an essential component of movie valuable information about the reviews of each
recommendation systems as they provide access to movie and help the system to recommend movies
the necessary movie data for the system to with mostly positive reviews to users. This can help
function. They also ensure that the data is being to improve the user experience by recommending
used in an authorized manner, and provide a way movies that are more likely to be enjoyed by the
for the database to track and limit the usage of the user.
data.

It is important to note that usage of API keys IV. RELATED WORK

also have limits and usage costs, it is important to In recent years, there have been many research
manage the usage of the keys to avoid over usage studies on the topic of recommendation systems,
and prevent the system from breaking. particularly in the field of movies. Many of these
studies have focused on improving the accuracy
IV. SENTIMENT ANALYSIS USING NLP and effectiveness of movie recommendation
Sentiment analysis is a process of determining systems.
the emotional tone behind a piece of text, such as a
One common approach is the use of
movie review. In the context of movie
collaborative filtering, which is a method of
recommendation systems, sentiment analysis can
making recommendations based on the preferences
be used to determine whether a review is positive
and behavior of other similar users. This approach
or negative and use that information to recommend
can be used to recommend movies to users based
movies to users.
on the movies that similar users have watched or
One popular method for performing sentiment liked. Studies such as Katarya et al.[1] and Kumar
analysis is using a multinomial naive bayes M et al.[2] have proposed movie recommendation
algorithm. This algorithm is a type of probabilistic system using collaborative filtering.
classifier that is commonly used in natural
Another popular approach is content-based
language processing (NLP) tasks. The algorithm
filtering, which is a method of making
uses Bayes' theorem to calculate the probability
recommendations based on the characteristics of
that a review is positive or negative based on the
the movies being recommended. This approach can
words used in the review.
be used to recommend movies to users based on the
The multinomial naive bayes algorithm is trained movies they have previously watched or liked, and
on a labelled dataset of reviews, where the reviews can include characteristics such as genre, director,
are labelled as positive or negative. The algorithm actors, and plot. Studies such as Mohammed F et
then uses this training data to learn the probability al.[3] and S. Halder et al.[4] have proposed movie
of each word appearing in a positive or negative recommendation system using content-based
review. This information is then used to classify filtering.
new reviews as positive or negative.
Many studies have also focused on using natural
In the movie recommendation system, the language processing (NLP) techniques to extract
algorithm is used to classify the reviews of each features from movie reviews and using those
movie as positive or negative. The system then uses features to recommend similar movies. Some
this information to recommend movies with mostly studies have used sentiment analysis techniques to
positive reviews to the users. classify movie reviews as positive or negative and
use that information to recommend movies to users.
The advantage of using multinomial naive bayes Studies such as Yueshen et al.[] and J.S. Breese et
algorithm is that it is relatively simple and easy to al.[] have proposed a movie recommendation
implement, yet it still performs well in many NLP system using NLP and sentiment analysis.
tasks. The algorithm also requires a relatively small
amount of training data to achieve good results. Some studies have used hybrid methods which
combine collaborative filtering and content-based
In summary, using sentiment analysis, filtering, or other techniques to improve the
specifically multinomial naive bayes algorithm, in accuracy and effectiveness of movie
a movie recommendation system can provide recommendation systems.
Fig. 3 Code indicating Web Scraping

we utilized an API key from the TMDB website


to access movie, images and actor data. The tmdb-
python library, which is written in Python, was
V. PROPOSED METHODOLOGY used as an asynchronous wrapper for The Movie
Database (TMDb) API v3. The Movie Database
(TMDB) is a community-based movie database that
offers an API service for those looking to use its
movie, TV show, or actor images and data in their
applications.

We utilized, the TMDB website as a key source


for obtaining important information about movies,
including the overview, images of actors, and
details such as runtime and release date. This
Fig. 2 Architecture of the proposed System website played a significant role in providing the
necessary data for our movie recommendation
Fig. 2 illustrates the system architecture of our
system. The utilization of an API key from the
proposed movie recommendation system, outlining
TMDB website allows for dynamic searching
the structure, behaviour, and view of the system.
capabilities. This eliminates any restriction on
This visual representation captures the fundamental
movie names as we are able to access information
design and implementation of our model. To begin,
and images of any movie released on the TMDB
we created an account on the TMDB website and
website, in addition to using a static dataset
obtained an API key, After that we have
containing names and information of movies.
downloaded three datasets from the Kaggle website
Furthermore, through web scraping techniques, we
to be used in our system.
can gather top user reviews from the IMDB
The proposed system for movie recommendation website. These reviews are then subject to
is based on a combination of content-based sentiment analysis using naïve baye’s algorithm to
filtering, sentiment analysis, and web scraping. In determine whether they are "good" or "bad." Based
our proposed system, we utilized three datasets to on this analysis, the system recommends the top 10
construct our movie database - related movies of similar content or genres through
"movie_metadata.csv" containing 5000 data of the use of content-based filtering in machine
movies, "movies_metadata.csv" with 45000 movie learning.
entries, and "credit.csv" which holds cast and crew
A. Code for Webs crapping of movie reviews :-
information of all our movies. These datasets were
downloaded from Kaggle website and pre-
processed for training, allowing us to perform
recommendations based on movies such as the title,
genre, author, cast, popularity, revenue, release
date, runtime, and imdb_id etc.

This dataset is pre-processed and trained to be


used in the recommendation system. The system
performs recommendations based on the same
content or genre of the movie. Fig. 4 Code for Web scraping movie reviews

Movies that are not present in the database, such The above code performs web scrapping to get
as those released after 2016, are extracted using user reviews from IMDB website, Scraping is
web scraping by using beautifulsoup4 python simply a process of extracting from any web page
library from Wikipedia and their features are and displaying it in our application. Web scraping
trained for use in the system. which is the extraction of data from web is the
practice of collecting and analysing data from the
internet. To accomplish this task, we utilized the a. Csv file
beautifulsoup4 python library, which makes it easy
to scrape information from web pages. The code
extracts reviews of movies from the HTML content
and the soup object contains the tags of HTML file
it extracts the tags and adds them to the python list.

The proposed system employs natural language


processing techniques to perform sentiment
analysis on reviews scraped from the IMDB
website. The code for this analysis takes the
reviews as input and uses a pre-trained model to
determine whether the review is "Good" or "Bad".
Fig. 6 Data in csv file
Below is the code for sentiment analysis where
we have taken the reviews and passed them to our
model to perform analysis that is whether the b. Movie Information
review is “Good” or “Bad:-

Fig. 5 Code performing Sentiment Analysis


Fig. 7 Searched Movie
After performing sentiment analysis on user
reviews the system will recommend the Top 10 c. Top cast Image and Information
related movies to the user of same genres using
Content-based filtering method with the help of
cosine similarity.

Content-based filtering is a method used in


today’s recommendation systems to propose the
same type of product, based on the characteristics
of the product. It is a type of collaborative filtering
that focuses on the characteristics and the metadata
of a particular product to suggest other product of
the same type and characteristics. For example, a
movie recommender can analyse a movie’s genre
Fig. 8 Data on actors is obtained
and director to recommend additional same types
of movies with similar contents and properties.
The idea behind content-based filtering is that if a
user likes a certain item, they are likely to enjoy
other items with similar characteristics. The
algorithm calculates the similarity score between
two movies, and the movies with the highest
similarity score are recommended to the user.

VI. SCREENSHOTS
d. Cast Biography(web scraped) the tenth movie. For example, when searching for
the movie "Avengers: Infinity War" (whose genre
is Adventure, Action, Science Fiction), the system
recommended 9 out of 10 movies of the same genre
and the tenth movie genre was only Adventure,
Action (Sherlock Holmes) where science fiction is
not used. This demonstrates that our system was
able to accurately recommend movies to users with
an accuracy of 96.33% and accuracy of sentiment
analysis is found to be 98.77%.

Fig. 9 Web Scrapped data of actors

e. Sentiment analysis on user reviews

Fig. 12 Accuracy score of sentiment analysis

Additionally, In order to perform sentiment


analysis on user reviews, the proposed system
utilizes the Naive Bayes algorithm which is a
commonly used machine learning algorithm in
natural language processing (NLP) tasks such as
classification of text, analysis of sentiments, and
spam detection. The sklearn library was imported
Fig. 10 Sentiment analysis using in NLP in the system to implement the Naive Bayes
algorithm and check the accuracy of sentiment
f. Movie Recommendations analysis. The code used to import the library is
"from sklearn import naive_bayes". Through the
implementation of this algorithm, the system was
able to accurately perform sentiments on reviews
with an accuracy of around 98.77%.

VIII. CONCLUSION
In conclusion, the proposed recommendation
system has been developed using a combination of
various techniques such as web scraping, NLP, and
machine learning. The system utilizes a dataset
Fig.11 Recommendations for movies
containing movie information till 2016, which was
VII. RESULTS pre-processed and trained to perform
recommendations based on dataset. Additionally,
In our proposed system, we utilized a dataset of
the system also uses web scraping technique to
almost 45,000 movies, which included information
extract the movie names from the Wikipedia
such as movie title, director name, status, image
website movie released after 2016 i.e. 2017,2018
URLs, language, country, and budget. To train the
and so on etc. User reviews were scraped from the
dataset, we used a sample of approximately 5000
IMDB website, and sentiment analysis was
movies for training and testing. To recommend
performed on the reviews. The system recommends
similar movies to the user, we employed the
the top 10 related movies to the user of the same
content-based filtering method with the help of
genres using a content-based filtering method with
cosine similarity. This method recommended
the help of cosine similarity. The system was able
movies of the same genre to the user. When our
to provide relevant movie recommendations to the
system recommended the top 10 movies of the
users and received positive feedback from the
same genre, we found that 9 out of 10 movies were
users. Overall, the proposed recommendation
of the same genre, with only a small difference in
system is effective in providing personalized movie
recommendations to users.
Movie Swarm,"
2012
IX. FUTURE WORK
As future work, we can expand our system to
include more features and data sources to improve
the accuracy of our recommendations. We can also
incorporate other machine learning techniques such Second
International
as collaborative filtering and deep learning to
enhance the performance of our system.
Additionally, we can integrate more advanced
natural language processing techniques to improve
the sentiment analysis aspect of our system.
Furthermore, it can be integrated with other
Conference on
platforms like streaming platforms to provide an
end-to-end recommendation system.
Cloud and Green
References
[1] R. Katarya and O. P. Verma,
Computing,
"Effectivecollaborative movie recommender
system using asymmetric user similarity and matrix Xiangtan, 2012,
factorization," 2016 International Conference on
Computing, Communication and Automation
(ICCCA), Noida, 2016, pp. 71-75, doi:
pp. 804-809, doi:
10.1109/CCAA.2016.7813692.

[2] M. Kumar and D. Yadav and Ashutosh Kumar


10.1109/
Singh and V. K. Gupta “A Movie Recommender
System: MOVREC “,International Journal of
Computer Applications ,Volume 124,2015,pp.7-11.
CGC.2012.121.
[4] S. Halder, A. M. J. Sarkar and Y. Lee, "Movie
Recommendation System Based on Movie Swarm,"
[3] Alhamid, M.F., Rawashdeh, M., Hossain, M.A. 2012 Second International Conference on Cloud
et al. Towards context-aware media and Green Computing, Xiangtan, 2012, pp. 804-
recommendation based on social tagging. J 809, doi: 10.1109/CGC.2012.121.
IntellInfSyst 46, 499–516 (2016). doi:
10.1007/s10844-015-0364-5 [5] L. Yang, Y.Li, and R. S. Sherratt. “Sentiment
Analysis for E-Commerce Product Reviews in

S. Halder, A. M. Chinese Based on Sentiment Lexicon and Deep


Learning”. IEEEAccess. vol8, pp. 23522-23530,
2020.

J. Sarkar and Y. [6] Y. Lin, J.Li, L. Yang, and H. Lin. “Sentiment


Analysis with Comparison Enhanced Deep Neural
Lee, "Movie Network”. IEEE Access.8, pp. 78378-78384,
2020.

Recommendation [7] Zhang, J.; Wang, Y.; Yuan, Z.; Jin, Q.;
“Personalized Real-Time Movie Recommendation

System Based on System: Practical Prototype and Evaluation”,


Tsinghua Science And Technology, vol: 25, 2020,
pp: 180-191

[8] Bollen, Johan, Huina Mao, and Alberto Pepe.


"Modeling public mood and emotion: Twitter
sentiment and socio-economic phenomena."
ICWSM. 2011.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy