Title: Movie Recommendation System Documentation: 1. Demographic Filtering

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Title: Movie Recommendation System Documentation

Introduction
The Movie Recommendation System presented in this code offers a personalized and efficient
solution for suggesting movies to users based on their preferences and movie similarities. With
the ever-increasing number of movies available, it can be challenging for individuals to discover
new films that align with their interests.

Data set:
The Movie Recommendation System is a Python program that uses two different approaches,
Demographic Filtering and Content-Based Filtering, to provide movie recommendations based
on user preferences and movie similarities. The system utilizes a dataset containing information
about movies, such as cast, crew, genres, budget, popularity, and more.

1. Demographic Filtering:
 The Demographic Filtering approach offers generalized recommendations to
every user based on movie popularity and genre.

 It recommends the same movies to users with similar demographic features,


assuming that popular and critically acclaimed movies have a higher probability
of being liked by the average audience.

 The system calculates a score for each movie using IMDB's weighted rating
formula, which takes into account the number of votes, average rating, and
mean rating across all movies.

 The movies are filtered based on a minimum vote count threshold (90th
percentile) to ensure only movies with a sufficient number of votes are included.

 The top-rated movies are then sorted based on their score and recommended to
the users.

2. Content-Based Filtering:
 The Content-Based Filtering approach recommends movies to users based on
the similarity of their content, such as plot descriptions, cast, crew, keywords,
and taglines.
 To implement this, the program uses the Term Frequency-Inverse Document
Frequency (TF-IDF) vectorization technique to convert the movie overviews into
numerical vectors.

 TF-IDF represents the relative importance of each word in the overviews by


considering the frequency of the word in a movie and its frequency across all
movies.

 The cosine similarity score is computed between all pairs of movies using their
TF-IDF vectors. Cosine similarity is a measure of similarity between two vectors
that is independent of their magnitudes.

 Given a movie title, the program retrieves the index of the movie in the dataset
and finds the top 10 most similar movies based on cosine similarity scores.

 The recommended movies are returned based on their indices and displayed to
the user.

Requirements:
 Python 3.x

 pandas library

 numpy library

 scikit-learn library

 matplotlib library

Usage:
1. Import the necessary libraries:

import pandas as pd

import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.metrics.pairwise import linear_kernel

import matplotlib.pyplot as plt


Load the movie dataset:

df1 = pd.read_csv('tmdb_5000_credits.csv')

df2 = pd.read_csv('tmdb_5000_movies.csv')

Join the two datasets on the 'id' column:

df1.columns = ['id', 'title', 'cast', 'crew']

df2 = df2.merge(df1, on='id')

Perform Demographic Filtering:


Calculate the mean rating for all movies:

C = df2['vote_average'].mean()

Filter the movies that qualify for the chart:

m = df2['vote_count'].quantile(0.9)

Calculate the weighted rating score for each qualified movie:

def weighted_rating(x, m=m, C=C):

v = x['vote_count']

R = x['vote_average']

return (v / (v + m) * R) + (m / (m + v) * C)

q_movies['score'] = q_movies
Summary:
The provided code implements a Movie Recommendation System using two
different approaches: Demographic Filtering and Content-Based Filtering.In the
Demographic Filtering approach, the system recommends popular and critically
acclaimed movies to users based on movie popularity and genre. It calculates a
score for each movie using IMDB's weighted rating formula, considering factors
such as the number of votes, average rating, and overall mean rating. The movies
are filtered based on a minimum vote count threshold to ensure only movies with
a sufficient number of votes are included. The top-rated movies are then sorted
based on their score and recommended to the users.

The Content-Based Filtering approach recommends movies to users based on the


similarity of their content, such as plot descriptions, cast, crew, keywords, and
taglines. It utilizes the Term Frequency-Inverse Document Frequency (TF-IDF)
vectorization technique to convert the movie overviews into numerical vectors.
The cosine similarity score is computed between all pairs of movies using their TF-
IDF vectors. This score represents the similarity between two movies regardless of
their magnitudes. Given a movie title, the system retrieves the index of the movie
in the dataset and finds the top 10 most similar movies based on cosine similarity
scores. These recommended movies are then displayed to the user.
The code requires Python 3.x and the libraries pandas, numpy, scikit-learn, and
matplotlib. It loads a movie dataset, performs data preprocessing and feature
engineering, and implements the two recommendation approaches. The
Demographic Filtering is based on movie ratings and popularity, while the
Content-Based Filtering relies on textual data and similarity calculations.

Overall, this Movie Recommendation System provides personalized movie


recommendations to users based on their preferences and movie similarities,
enhancing the movie-watching experience and helping users discover new movies
of interest.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy