0% found this document useful (0 votes)
12 views6 pages

F24_Proj4

The document outlines the requirements for a Movie Recommender System project due in Fall 2024, utilizing the MovieLens dataset with approximately 1 million ratings for 3,706 movies. Students must submit an HTML file containing code for a recommendation system based on popularity and item-based collaborative filtering (IBCF), along with a web link to their application. The project emphasizes the importance of defining popularity and implementing the IBCF algorithm without using existing recommender packages.

Uploaded by

wedxwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

F24_Proj4

The document outlines the requirements for a Movie Recommender System project due in Fall 2024, utilizing the MovieLens dataset with approximately 1 million ratings for 3,706 movies. Students must submit an HTML file containing code for a recommendation system based on popularity and item-based collaborative filtering (IBCF), along with a web link to their application. The project emphasizes the importance of defining popularity and implementing the IBCF algorithm without using existing recommender packages.

Uploaded by

wedxwe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Project 4: Movie Recommender

System Fall 2024

Contents
MovieLens Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Submission Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
HTML File (4 points) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The App (3.5 points) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
.

MovieLens Dataset

The dataset comprises approximately 1 million anonymous ratings for 3,706 movies, provided by 6,040
MovieLens users who joined the platform in 2000.
You can find some insights from our exploratory data analysis: [Rcode__W13__Movie__EDA.html]
[Python__W13__Movie__RS.html]
You can download a copy of the 6040-by-3706 rating matrix in CSV format, complete with column names
(“m” + MovieID) and row names (“u” + UserID), from Coursera/Canvas.
该数据集包括 3706 部电影的大约 100 万个匿名评分,由 6040 名 2000 年加入该平台的 MovieLens 用户提供。
你可以从我们的探索性数据分析中找到一些见解:[Rcode__W13__Movie__EDA.html]
(Python__W13__Movie__RS.html)
你可以从 Coursera/Canvas 下载 6040 × 3706 评分矩阵的 CSV 格式副本,包括列名(“m”+ MovieID)和行名(“u”+
UserID)。

Submission Requirements

To complete this assignment, please provide the following:

1. An R Markdown or Python Jupyter Notebook saved in HTML format, or a link to such a file. This
file should contain all the necessary code to replicate the reported results. There is no page limit for
this part.

2. A web link to your movie recommendation application built by your team. You may share the source
code link or submit the code as a zip file on Coursera/Canvas.
It’s important to note that you cannot utilize any recommender packages from R or Python. However, you
are free to use other packages as needed.
为完成此作业,请提供以下资料:
1. 以 HTML 格式保存的 R Markdown 或 Python Jupyter Notebook,或指向此类文件的链接。该文件应包含复制报告结果所
需的所有代码。这部分没有页数限制。

1
2. 由您的团队构建的电影推荐应用程序的 web 链接。您可以在 Coursera/Canvas 上共享源代码链接或以 zip 文件的形式提交代
码。
需要注意的是,您不能使用来自 R 或 Python 的任何推荐包。但是,您可以根据需要自由地使用其他包。

HTML File (4 points)

The HTML file should contain two key components:

System I: Recommendation Based on Popularity

Recommend the top ten most popular movies. Please clearly define what you mean by “most popular.” For
example, are you considering movies with a high number of reviews as popular, or are you using additional
criteria, such as counting only reviews above a specific threshold, or focusing on movies whose average or
median rating exceeds a certain threshold (i.e., excluding movies with a significant number of low ratings
from being classified as popular)?

1
There is no single correct answer. We are primarily interested in ensuring that your implementation aligns
consistently with your definition.

Provide the code for implementing your recommendation scheme and display the top ten movies,
including their MovieID (or “m” + MovieID), title, and poster images.
系统一:基于人气的推荐

推荐十大最受欢迎的电影。请明确定义一下你所说的“最受欢迎”是什么意思。例如,您是将评论数量较多的电影视为受欢迎的电影,还
是使用额外的标准,例如仅计算超过特定阈值的评论,或者关注平均或中位数评分超过特定阈值的电影(即,将大量低评分的电影排除
在受欢迎的电影之外)?

没有唯一的正确答案。我们主要感兴趣的是确保您的实现与您的定义一致。
提供实现推荐方案的代码,并显示十大电影,包括它们的 MovieID(或“m”+ MovieID)、标题和海报图像。
System II: Recommendation Based on IBCF

For this system, follow these steps. Let R denote the 6040-by-3706 rating matrix.

1. Normalize the rating matrix by centering each row. This means subtracting row means from each row
of the rating matrix R. Row means should be computed based on non-NA entries. For instance, the
mean of a vector like (2, 4, NA, NA) should be 3.

2. Compute the (transformed) Cosine similarity among the 3,706 movies. For movies i and j, let I ij
denote the set of users who rated both movies i and j. We decide to ignore similarities computed based
on less than three user ratings. Thus, define the similarity between movie i and movie j as follows,
when the cardinality of I ij is bigger than two,

This transformation (1 + cos)/2 ensures that similarity measures are between 0 and 1. NA values may
occur when:

1) the set I ij has a cardinality less than or equal to two (i.e., this pair of movies have been rated by
only zero, one, or two users);
2) one of the denominators is zero.

3. Let S denote the 3706-by-3706 similarity matrix computed in previous step. For each row, sort the
non-NA similarity measures and keep the top 30, setting the rest to NA. This new similarity matrix,
still denoted as S, is no longer symmetric. Save this matrix. Note that some rows of the S matrix may
contain fewer than 30 non-NA values.

Display the pairwise similarity values from the S matrix (you obtained at Step 3) for the following
specified movies: “m1”, “m10”, “m100”, “m1510”, “m260”, “m3212”. Please round the results to
7 decimal places.
4. Create a function named myIBCF:
• Input: newuser, a 3706-by-1 vector (denoted as w) containing ratings for the 3,706 movies from a new
user. Many entries in this vector will be NA. Non-NA values should be integers 1, 2, 3, 4, or 5, as
ratings are based on a 5-star scale (whole star ratings only). The order of the movies in this vector
should match the rating matrix R. (Should we center w? For IBCF, centering the new user ratings is
not necessary.)

2
• Inside the function: Upon receiving this input, your function should load the similarity matrix and
use it to compute predictions for movies that have not been rated by this new user yet. Use the following
formula to compute the prediction for movie i:

where S(i) = {l : S il NA}. The formula above is identical to the one on page 10 of
[lec__W13__RecommenderSystem.pdf] where the rating for the j-th movie for this new user is denoted
as w j here, but as r aj in lec__W13__RecommenderSystem.pdf. Note that NA values may arise when the
denominator equals zero.

2
• Output: Based on your predictions, recommend the top ten movies to this new user, using the column
names of the rating matrix R (i.e., “m” +MovieID).

If fewer than 10 predictions are non-NA, select the remaining movies based on the popularity defined in
System 1, prioritizing the most popular ones and excluding those already rated by the user. Save the
ranking of all movies (based on popularity) as a separate file to avoid recomputing the ranking each
time.
输出:根据您的预测,使用评分矩阵 R(即“m”+MovieID)的列名向这个新用户推荐十大电影。
如果少于 10 部预测为非 na,则根据系统 1 中定义的受欢迎程度选择剩余的电影,优先考虑最受欢迎的电影,并排除用户已经评分的
电影。将所有电影的排名(基于受欢迎程度)保存为单独的文件,以避免每次重新计算排名。
Test your function

For your function myIBCF, print the top 10 recommendations for the following two users:
• User “u1181” from the rating matrix R
• A hypothetical user who rates movie “m1613” with 5 and movie “m1755” with 4.
测试你的功能

对于您的函数 myIBCF,打印以下两个用户的十大推荐:
•用户“u1181”从评级矩阵 R
•假设用户给电影“m1613”打 5 分,给电影“m1755”打 4 分。

The App (3.5 points)

Build an application for System II. Here are the requirements:


• Present users with a set of sample movies and ask them to rate them.
• Use the ratings provided by the user as input for your myIBCF function.
• Display 10 movie recommendations returned by your myIBCF function.
To save space and/or memory, you can choose to display up to 100 movies. This means you can modify
your myIBCF function slightly so that it only requires 100 columns of the S matrix. You can decide which
100 movies to display. Mention these additional implementation details at the end of your HTML file if
applicable, but there’s no need to include the modified version of the myIBCF function in the HTML file.
We developed an imitation app inspired by a [book recommendation system]. While it appears to gather user
data, the 10 recommendations it provides are, in fact, fixed and unchanging, consistently featuring the same
initial 10 movies.
https://fengliang.shinyapps.io/Mov ieRecommend/
为系统 II 构建一个应用程序。以下是要求:
•向用户提供一组样本电影,并要求他们对其进行评分。
•使用用户提供的评分作为 myIBCF 函数的输入。
•显示由 myIBCF 函数返回的 10 部电影推荐。
为了节省空间和/或内存,您可以选择显示多达 100 部电影。这意味着您可以稍微修改 myIBCF 函数,使其只需要 S 矩阵的 100
列。您可以决定要显示哪 100 部电影。如果可以的话,在 HTML 文件的末尾提到这些附加的实现细节,但是不需要在 HTML 文件
中包含 myIBCF 函数的修改版本。
我们受[图书推荐系统]的启发,开发了一款模仿应用。虽然它似乎在收集用户数据,但它提供的 10 条推荐实际上是固定不变的,始终
以相同的 10 部电影为特色。
3
https://fengliang.shinyapps.io/MovieRecommend/

Resources

You are welcome to use any existing code, provided you cite the source. For example, you can check how two
packages implement IBCF:
• R code for package recommenderlab [https://github.com/mhahsler/recommenderlab/tree/master/R]
• Python code for package Surprise [https://github.com/NicolasHug/Surprise]

• The Github repository for the Book Recommender System mentioned above: [https://github.com/psp
achtholz/BookRecommender].
For the App, you can use Shiny if using R. Python users can consider using frameworks like [Shiny], [Dash],
or [Flask], and [Streamlit].

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy