0% found this document useful (0 votes)

15 views5 pages

Project 2 - Movielens Case Study

Uploaded by

a4amittewari007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Project 2 - Movielens Case Study

Uploaded by

a4amittewari007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Movielens Case Study

DESCRIPTION

Background of Problem Statement :

The GroupLens Research Project is a research group in the Department of Computer Science and
Engineering at the University of Minnesota. Members of the GroupLens Research Project are
involved in many research projects related to the fields of information filtering, collaborative filtering,
and recommender systems. The project is led by professors John Riedl and Joseph Konstan. The
project began to explore automated collaborative filtering in 1992 but is most well known for its
worldwide trial of an automated collaborative filtering system for Usenet news in 1996. Since then,
the project has expanded its scope to research overall information by filtering solutions, integrating
into content-based methods, as well as, improving current collaborative filtering technology.

Problem Objective:

Here, we ask you to perform the analysis using the Exploratory Data Analysis technique. You need to
find features affecting the ratings of any particular movie and build a model to predict the movie
ratings.

Domain: Entertainment

Analysis Tasks to be performed:

Import the three datasets

Create a new dataset [Master_Data] with the following columns MovieID Title UserID Age Gender
Occupation Rating. (Hint: (i) Merge two tables at a time. (ii) Merge the tables using two primary keys
MovieID & UserId)

Explore the datasets using visual representations (graphs or tables), also include your comments on
the following:

User Age Distribution

User rating of the movie “Toy Story”

Top 25 movies by viewership rating

Find the ratings for all the movies reviewed by for a particular user of user id = 2696

Feature Engineering:

Use column genres:

Find out all the unique genres (Hint: split the data in column genre making a list and then process the
data to find out only the unique categories of genres)

Create a separate column for each genre category with a one-hot encoding ( 1 and 0) whether or not
the movie belongs to that genre.

Determine the features affecting the ratings of any particular movie.

Develop an appropriate model to predict the movie ratings

Dataset Description:

These files contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040
MovieLens users who joined MovieLens in 2000.

Ratings.dat

Format - UserID::MovieID::Rating::Timestamp

Field Description

UserID Unique identification for each user

MovieID Unique identification for each movie

Rating User rating for each movie

Timestamp Timestamp generated while adding user review

UserIDs range between 1 and 6040

The MovieIDs range between 1 and 3952

Ratings are made on a 5-star scale (whole-star ratings only)

A timestamp is represented in seconds since the epoch is returned by time (2)

Each user has at least 20 ratings

Users.dat

Format - UserID::Gender::Age::Occupation::Zip-code
Field Description

UserID Unique identification for each user

Genre Category of each movie

Age User’s age

Occupation User’s Occupation

Zip-code Zip Code for the user’s location

All demographic information is provided voluntarily by the users and is not checked for accuracy.
Only users who have provided demographic information are included in this data set.

Gender is denoted by an "M" for male and "F" for female

Age is chosen from the following ranges:

Value Description

1 "Under 18"

18 "18-24"

25 "25-34"

35 "35-44"

45 "45-49"

50 "50-55"

56 "56+"

Occupation is chosen from the following choices:

Value

Description

0 "other" or not specified

1 "academic/educator"

2 "artist”

3 "clerical/admin"
4 "college/grad student"

5 "customer service"

6 "doctor/health care"

7 "executive/managerial"

8 "farmer"

9 "homemaker"

10 "K-12 student"

11 "lawyer"

12 "programmer"

13 "retired"

14 "sales/marketing"

15 "scientist"

16 "self-employed"

17 "technician/engineer"

18 "tradesman/craftsman"

19 "unemployed"

20 "writer”

Movies.dat

Format - MovieID::Title::Genres

Field Description

MovieID Unique identification for each movie

Title A title for each movie

Genres Category of each movie

Titles are identical to titles provided by the IMDB (including year of release)

Genres are pipe-separated and are selected from the following genres:
 Action
 Adventure
 Animation
 Children's
 Comedy
 Crime
 Documentary
 Drama
 Fantasy
 Film-Noir
 Horror
 Musical
 Mystery
 Romance
 Sci-Fi
 Thriller
 War
 Western

Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries

Movies are mostly entered by hand, so errors and inconsistencies may exist

Author: Hermon Masih

𝐓𝐨𝐨𝐥𝐬 𝐔𝐬𝐞𝐝 : •Programming Language : Python

•IDE : Jupyter Notebook

Dataset: https://drive.google.com/file/d/1kMySbxVf7kmXU07AVi7MZKwDfcRLfh7_/view?
usp=drive_link

Problem Statement - Power BI
No ratings yet
Problem Statement - Power BI
2 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Movielens Case Study
50% (4)
Movielens Case Study
2 pages
Quick Guide On The FDR Nano
100% (2)
Quick Guide On The FDR Nano
13 pages
MovieLens Project Report
No ratings yet
MovieLens Project Report
19 pages
Movielens Recommender System Capstone Project: Compiled by Mahesh Halkeri
No ratings yet
Movielens Recommender System Capstone Project: Compiled by Mahesh Halkeri
19 pages
Project Movielense Solution
29% (7)
Project Movielense Solution
4 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
Readme 100k
No ratings yet
Readme 100k
3 pages
RE Paper
No ratings yet
RE Paper
25 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
21Bcs5066 - Deepanshu Tyagi Source Code: #Importing Libraries
No ratings yet
21Bcs5066 - Deepanshu Tyagi Source Code: #Importing Libraries
18 pages
Movies Statistical Analysis
No ratings yet
Movies Statistical Analysis
3 pages
Movie Recommendation System in R Jupyter Notebook
No ratings yet
Movie Recommendation System in R Jupyter Notebook
18 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
2 pages
Imdb Scrape v1
No ratings yet
Imdb Scrape v1
9 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
Analytic Project Report APR
No ratings yet
Analytic Project Report APR
42 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
A Predictor For Movie Success: 2.1 Data Collection
No ratings yet
A Predictor For Movie Success: 2.1 Data Collection
5 pages
IMDB Dataframe Insights
No ratings yet
IMDB Dataframe Insights
3 pages
Bemm459j Week 2
No ratings yet
Bemm459j Week 2
36 pages
Imdb Scrape v3
No ratings yet
Imdb Scrape v3
9 pages
Homework Assignment #2: (Data Wrangling Principles)
No ratings yet
Homework Assignment #2: (Data Wrangling Principles)
4 pages
Netflix HD
No ratings yet
Netflix HD
21 pages
Movielens Dataset Analysis Project04 Doc
No ratings yet
Movielens Dataset Analysis Project04 Doc
2 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
MovieLens Final-Project
No ratings yet
MovieLens Final-Project
18 pages
1st Harvard Project
No ratings yet
1st Harvard Project
17 pages
Analysis and Clustering of Movie Genres
No ratings yet
Analysis and Clustering of Movie Genres
8 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
The Movie Database: 1 Background
No ratings yet
The Movie Database: 1 Background
14 pages
Sneha Kumari - 262 - DS Project.
No ratings yet
Sneha Kumari - 262 - DS Project.
19 pages
BCM Project
No ratings yet
BCM Project
4 pages
Project MovieLens 17082019 by Monalisa Ganguly
No ratings yet
Project MovieLens 17082019 by Monalisa Ganguly
28 pages
Kathmandu University Department of Computer Science and Engineering
No ratings yet
Kathmandu University Department of Computer Science and Engineering
22 pages
Movie Recommendation System Analysis
No ratings yet
Movie Recommendation System Analysis
8 pages
Business Intelligence Project Report
No ratings yet
Business Intelligence Project Report
14 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
Building Groups Sets Hierarchies
No ratings yet
Building Groups Sets Hierarchies
1 page
Movie Rating Analysis
No ratings yet
Movie Rating Analysis
7 pages
F24 Proj4
No ratings yet
F24 Proj4
6 pages
Python Project
No ratings yet
Python Project
1 page
Banana Level Problems
No ratings yet
Banana Level Problems
5 pages
Recommendation Engine Problem Statement
No ratings yet
Recommendation Engine Problem Statement
37 pages
Project 2
No ratings yet
Project 2
8 pages
Movie Recommendation System-Jupyter System
No ratings yet
Movie Recommendation System-Jupyter System
8 pages
Movie Success Prediction Using Data Mining PHP
100% (1)
Movie Success Prediction Using Data Mining PHP
3 pages
Netflix Analysis Report (2105878 - Bibhudutta Swain)
No ratings yet
Netflix Analysis Report (2105878 - Bibhudutta Swain)
19 pages
2331 Mid Program Project v1 Es3 D2i02jl
No ratings yet
2331 Mid Program Project v1 Es3 D2i02jl
5 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Netflix Recommendation Based On IMDB
No ratings yet
Netflix Recommendation Based On IMDB
5 pages
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
No ratings yet
Department of Computer Science and Engineering (Data Science) Subject: Recommender System Laboratory (DJS22DSL6012)
16 pages
Team Renegades MMLA Report
No ratings yet
Team Renegades MMLA Report
27 pages
CS2102 Report - A Movie Database Project
No ratings yet
CS2102 Report - A Movie Database Project
19 pages
Handout 2
No ratings yet
Handout 2
3 pages
13 de Thi Toan Lop 1 HK1-2010-2011
No ratings yet
13 de Thi Toan Lop 1 HK1-2010-2011
7 pages
Enacting Platforms: Feminist Technoscience and the Unreal Engine
From Everand
Enacting Platforms: Feminist Technoscience and the Unreal Engine
James Malazita
No ratings yet
Passive Computer
No ratings yet
Passive Computer
12 pages
CSC2105: Algorithms Graph - Introduction
No ratings yet
CSC2105: Algorithms Graph - Introduction
55 pages
Web Technologies Laboratory: Lab Manual
No ratings yet
Web Technologies Laboratory: Lab Manual
47 pages
CS50 Appliance 2014
No ratings yet
CS50 Appliance 2014
12 pages
Myp Horizontal Vertical Planner With Atl Technology
No ratings yet
Myp Horizontal Vertical Planner With Atl Technology
8 pages
Varian DS102 202 302 405 602 User Manual
No ratings yet
Varian DS102 202 302 405 602 User Manual
140 pages
(Ebooks PDF) Download Java Programming Exercises Volume One Language Fundamentals and Core Concepts 1st Edition Christian Ullenboom Full Chapters
No ratings yet
(Ebooks PDF) Download Java Programming Exercises Volume One Language Fundamentals and Core Concepts 1st Edition Christian Ullenboom Full Chapters
38 pages
RS 578
No ratings yet
RS 578
5 pages
Comprehensive Worksheet
No ratings yet
Comprehensive Worksheet
10 pages
Agenda DAY 1 (25 - AUG - 2020) : Speaker: Rudi Lumanto, Chairman, ID-SIRTII/CC
No ratings yet
Agenda DAY 1 (25 - AUG - 2020) : Speaker: Rudi Lumanto, Chairman, ID-SIRTII/CC
2 pages
DS Predicatelogic
No ratings yet
DS Predicatelogic
5 pages
Rotating Equip. Engineer CV - Updateddoc
No ratings yet
Rotating Equip. Engineer CV - Updateddoc
4 pages
SIM7500 - SIM7600 - SIM7800 Series - TCPIP - AT Command Manual - V1.00
No ratings yet
SIM7500 - SIM7600 - SIM7800 Series - TCPIP - AT Command Manual - V1.00
43 pages
DAA Unit-2 D&C and Greedy R20
No ratings yet
DAA Unit-2 D&C and Greedy R20
18 pages
Descriptions Md5
No ratings yet
Descriptions Md5
84 pages
Small Signal Averaging Model of DAB With TPS
No ratings yet
Small Signal Averaging Model of DAB With TPS
6 pages
A Smart Fire Detection System Using Iot Technology With Automatic Water Sprinkler
No ratings yet
A Smart Fire Detection System Using Iot Technology With Automatic Water Sprinkler
9 pages
Ds Python Unit-I
No ratings yet
Ds Python Unit-I
30 pages
NIST - sp.800 40r4 Draft
No ratings yet
NIST - sp.800 40r4 Draft
27 pages
Advanced State Modelling
No ratings yet
Advanced State Modelling
16 pages
PELV7.3 Installation and Implementation Guide
100% (1)
PELV7.3 Installation and Implementation Guide
375 pages
Naukri Balkrishna Tiwari
No ratings yet
Naukri Balkrishna Tiwari
2 pages
Chapter Seven: Key Management
No ratings yet
Chapter Seven: Key Management
33 pages
E-WASTE Categories Schedule 1
No ratings yet
E-WASTE Categories Schedule 1
2 pages
Working Copy of Draft - 2024 Baafn Asian American Essay Contest Official Rules 2
No ratings yet
Working Copy of Draft - 2024 Baafn Asian American Essay Contest Official Rules 2
2 pages
Simp BMS Setup Manual V0.25
100% (1)
Simp BMS Setup Manual V0.25
43 pages
Ecm Estimating Ebook 1223
No ratings yet
Ecm Estimating Ebook 1223
23 pages
Spirent Gss6300: Gnss Signal Generator
No ratings yet
Spirent Gss6300: Gnss Signal Generator
2 pages
Accessibility and Visualization For Pulse Meter
No ratings yet
Accessibility and Visualization For Pulse Meter
122 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Project 2 - Movielens Case Study

Uploaded by

Project 2 - Movielens Case Study

Uploaded by

Movielens Case Study

Background of Problem Statement :

Analysis Tasks to be performed:

Import the three datasets

User Age Distribution

User rating of the movie “Toy Story”

Top 25 movies by viewership rating

Use column genres:

Determine the features affecting the ratings of any particular movie.

Develop an appropriate model to predict the movie ratings

UserID Unique identification for each user

MovieID Unique identification for each movie

Rating User rating for each movie

Timestamp Timestamp generated while adding user review

UserIDs range between 1 and 6040

The MovieIDs range between 1 and 3952

Ratings are made on a 5-star scale (whole-star ratings only)

A timestamp is represented in seconds since the epoch is returned by time (2)

Each user has at least 20 ratings

UserID Unique identification for each user

Genre Category of each movie

Age User’s age

Occupation User’s Occupation

Zip-code Zip Code for the user’s location

Gender is denoted by an "M" for male and "F" for female

Age is chosen from the following ranges:

Occupation is chosen from the following choices:

0 "other" or not specified

MovieID Unique identification for each movie

Title A title for each movie

Genres Category of each movie

Author: Hermon Masih

𝐓𝐨𝐨𝐥𝐬 𝐔𝐬𝐞𝐝 : •Programming Language : Python

•IDE : Jupyter Notebook

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.