Title: Movie Recommendation System Documentation: 1. Demographic Filtering
Title: Movie Recommendation System Documentation: 1. Demographic Filtering
Title: Movie Recommendation System Documentation: 1. Demographic Filtering
Introduction
The Movie Recommendation System presented in this code offers a personalized and efficient
solution for suggesting movies to users based on their preferences and movie similarities. With
the ever-increasing number of movies available, it can be challenging for individuals to discover
new films that align with their interests.
Data set:
The Movie Recommendation System is a Python program that uses two different approaches,
Demographic Filtering and Content-Based Filtering, to provide movie recommendations based
on user preferences and movie similarities. The system utilizes a dataset containing information
about movies, such as cast, crew, genres, budget, popularity, and more.
1. Demographic Filtering:
The Demographic Filtering approach offers generalized recommendations to
every user based on movie popularity and genre.
The system calculates a score for each movie using IMDB's weighted rating
formula, which takes into account the number of votes, average rating, and
mean rating across all movies.
The movies are filtered based on a minimum vote count threshold (90th
percentile) to ensure only movies with a sufficient number of votes are included.
The top-rated movies are then sorted based on their score and recommended to
the users.
2. Content-Based Filtering:
The Content-Based Filtering approach recommends movies to users based on
the similarity of their content, such as plot descriptions, cast, crew, keywords,
and taglines.
To implement this, the program uses the Term Frequency-Inverse Document
Frequency (TF-IDF) vectorization technique to convert the movie overviews into
numerical vectors.
The cosine similarity score is computed between all pairs of movies using their
TF-IDF vectors. Cosine similarity is a measure of similarity between two vectors
that is independent of their magnitudes.
Given a movie title, the program retrieves the index of the movie in the dataset
and finds the top 10 most similar movies based on cosine similarity scores.
The recommended movies are returned based on their indices and displayed to
the user.
Requirements:
Python 3.x
pandas library
numpy library
scikit-learn library
matplotlib library
Usage:
1. Import the necessary libraries:
import pandas as pd
import numpy as np
df1 = pd.read_csv('tmdb_5000_credits.csv')
df2 = pd.read_csv('tmdb_5000_movies.csv')
C = df2['vote_average'].mean()
m = df2['vote_count'].quantile(0.9)
v = x['vote_count']
R = x['vote_average']
return (v / (v + m) * R) + (m / (m + v) * C)
q_movies['score'] = q_movies
Summary:
The provided code implements a Movie Recommendation System using two
different approaches: Demographic Filtering and Content-Based Filtering.In the
Demographic Filtering approach, the system recommends popular and critically
acclaimed movies to users based on movie popularity and genre. It calculates a
score for each movie using IMDB's weighted rating formula, considering factors
such as the number of votes, average rating, and overall mean rating. The movies
are filtered based on a minimum vote count threshold to ensure only movies with
a sufficient number of votes are included. The top-rated movies are then sorted
based on their score and recommended to the users.