Niranjan, Karthik, Rahul - Rajkumar R - Scope: Movie Recommendation System Based On Script Analysis, Cosine Similarity
Niranjan, Karthik, Rahul - Rajkumar R - Scope: Movie Recommendation System Based On Script Analysis, Cosine Similarity
Motivation
## TP FP FN TN precision recall TPR
Generally we hear quotes from people like: “like food in that
## 10 2.438095 7.180952 66.63810 365.7429 0.2534653 0.03684398 0.03684398
restaurant ,you can try it”, “I saw this movie, you’ll like it” ,“Don’t go
see that movie!” .So instead of asking other people for ## 20 4.857143 14.380952 64.21905 358.5429 0.2524752 0.08070848
recommendation .we want to build a movie recommendation system 0.08070848
System based on Content Filtering & Script Analysis. ## 30 6.952381 21.904762 62.12381 351.0190 0.2409241 0.11743856
0.11743856
SCOPE of the Project
This system provides a personalized services to assist users in finding ## 40 9.104762 29.371429 59.97143 343.5524 0.2366337 0.15409146
0.15409146
favourite items along with huge number of available online movies in
the world wide web. We identify temporal preferences of an ## 50 11.152381 36.942857 57.92381 335.9810 0.2318812 0.19071582
individual based on their interest like romance, horror,etc and 0.19071582
provide personalization for users
## 60 13.695238 43.923810 55.38095 329.0000 0.2374257 0.23003707
0.23003707
Methodology
## FPR
The dataset that I’m working with is MovieLens, one of the most common
datasets that is available on the internet for building a Recommender System. ## 10 0.01909268
The version of the dataset that we aworking with (1M) contains 1,000,209
anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens ## 20 0.03845362
users who joined MovieLens in 2000.
## 30 0.05865530
After processing the data and doing some exploratory analysis, here are the
most interesting features of this dataset: ## 40 0.07869669
Here’s a word-cloud visualization of the movie titles:
## 50 0.09897217
Beautiful, isn’t it? I can recognize
that there are a lot of movie ## 60 0.11752674
franchises in this dataset, as In order to have a look at all the splits at the same time, I sum up the indices of
evidenced by words like II and III… In columns TP, FP, FN and TN:
addition to that, Day, Love, Life,
Time, Night, Man, Dead, American ## TP FP FN TN
are among the most commonly
occuring words. ## 10 10.98095 28.16190 291.2571 1437.600
Here’s a distribution of the user ## 20 22.20000 56.08571 280.0381 1409.676
ratings:
## 30 32.47619 84.94286 269.7619 1380.819
Conclusion
In this project, we have developed and evaluated a collaborative filtering
recommender (CFR) system for recommending movies. The online app was
created to demonstrate the User-based Collaborative Filtering approach for
It appears that users are quite generous in recommendation model.
their ratings. The mean rating is 3.58 on a
scale of 5. Half the movies have a rating of Strengths: User-based Collaborative Filtering gives recommendations that can be
4 and 5. I personally think that a 5-level complements to the item the user was interacting with.
rating skill wasn’t a good indicator as
Weaknesses: User-based Collaborative Filtering is a type of Memory-based
people could have different rating styles Collaborative Filtering that uses all user data in the database to create
(i.e. person A could always use 4 for an recommendations.
average movie, whereas person B only
gives 4 out for their favorites). Each user
rated at least 20 movies, so I doubt the References
distribution could be caused just by 1. P. Resnick, H. R. Varian, "Recommender Systems", Communications of the ACM, vol. 40, pp. 56-
58, 1997.
chance variance in the quality of movies.
Show Context Access at ACM Google Scholar
2. H. Lieberman, "Autonomous Interface Agents", Proceedings of CHI'97 (Atlanta GA March
1997), pp. 67-74.
Show Context Access at ACM Google Scholar