Movie Recommender System - Mushkan Keshri
Movie Recommender System - Mushkan Keshri
PROJECT REPORT
In partial fulfilment of the requirements for the award of the degree
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Under the guidance of
Sourav Goswami
(Note: All entries of the proforma of approval should be filled up with appropriate and
complete information. Incomplete proforma of approval in any respect will be summarily
rejected.)
Title of the Project – Sentiment Analysis
Project Members – Rudraneel Paul
-- Sayan Debnath
--Ayandeep Dutta
--Simanta Saha
--Vinayak Sharma
Guide Name – Mr Sourav Goswami
Signature of Candidates-
Signature of Approver –
TH
Date: 2 8 FEB, 2024
We hereby declare that the project work being presented in the project
proposal entitled “SENTIMENT ANALYSIS” in partial
fulfilment of the requirements for the award of the degree
of BACHELOR OF TECHNOLOGY at SILIGURI INSTITUTE OF
TECHNOLOGY, is an authentic work carried out under the
guidance of MR. SOURAV GOSWAMI. The matter embodied in this
project work has not been submitted elsewhere for the award of any
degree of our knowledge and belief.
Guide / Supervisor
Overview
History of Python
Environment Setup
Basic Syntax
Variable Types
Functions
Modules
Packages
Artificial Intelligence
o Machine Learning
o Natural Language Processing
Machine Learning
o Supervised and Unsupervised Learning
o NumPy
o Scikit-learn
o Pandas
BIG MART SALES PREDICTION
1. Introduction
2. Problem Statement
3. Advantages & Disadvantages
4. Future Scope
OVERVIEW
Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands. Python is derived from
many other languages, including ABC, Modula-3, C, C++, Algol-
68, Small Talk, UNIX shell, and other scripting languages. Python
is copyrighted. Like Perl, Python source code is now available under
the GNU General Public License (GPL). Python is now maintained
by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
FEATURES OF PYTHON
Easy-to-learn: Python has few Keywords, simple structure and clearly defined syntax. This
allows a student to pick up the language quickly.
Easy-to-Read: Python code is more clearly defined and visible to the eyes.
Easy -to-Maintain: Python's source code is fairly easy-to-maintain.
A broad standard library: Python's bulk of the library is very portable and cross platform
compatible on UNIX, Windows, and Macintosh.
Interactive Mode: Python has support for an interactive mode which allows interactive testing
and debugging of snippets of code.
Portable: Python can run on the wide variety of hardware platforms and has the same
interface on all platforms.
Extendable: You can add low level modules to the python interpreter. These modules enables
programmers to add to or customize their tools to be more efficient.
Scalable: Python provides a better structure and support for large programs than shell
scripting.
Apart from the above-mentioned features, Python has a big list of good features, few
are listed below:
It support functional and structured programming methods as well as OOP.
It can be used as a scripting language or can be compiled to byte code for
building large applications.
It provides very high level dynamic datatypes and supports dynamic type checking.
It supports automatic garbage collections.
It can be easily integrated with C, C++, COM, ActiveX, CORBA and JAVA.
ENVIRONMENT SETUP
Win 9x/NT/2000
OS/2
PalmOS
Windows CE
Acorn/RISC OS
BASIC SYNTAX OF PYTHON PROGRAM
Type the following text at the Python prompt and press the Enter –
If you are running new version of Python, then you would need to use print
statement with parenthesis as in print ("Hello, Python!");.
However in Python version 2.4.3, this produces the following result –
Hello, Python!
Python Identifiers
A Python identifier is a name used to identify a variable, function, class,
module or other object. An identifier starts with a letter A to Z or a to z or an
underscore (_) followed by zero or more letters, underscores and digits (0 to 9).
Python does not allow punctuation characters such as @, $, and % within
identifiers. Python is a case sensitive programming language.
Python Keywords
The following list shows the Python keywords. These are reserved words and you
cannot use them as constant or variable or any other identifier names. All the
Python keywords contain lowercase letters only.
Many programs can be run to provide you with some basic information about
how they should be run. Python enables you to do this with -h −
$ python-h
usage: python [option]...[-c cmd|-m mod | file |-][arg]...
Options and arguments (and corresponding environment variables):
Variables are nothing but reserved memory locations to store values. This means
that when you create a variable you reserve some space in memory.
Python variables do not need explicit declaration to reserve memory space. The
declaration happens automatically when you assign a value to a variable. The
equal sign (=) is used to assign values to variables.
Multiple Assignment
The data stored in memory can be of many types. For example, a person's age is
stored as a numeric value and his or her address is stored as alphanumeric
characters. Python has five standard data types −
String
List
Tuple
Dictionary
Number
Data Type Conversion
Sometimes, you may need to perform conversions between the built-in types.
To convert between types, you simply use the type name as a function.
There are several built-in functions to perform conversion from one data type to
another.
Defining a Function
mylist=[10,20,30];
change me(mylist);
print” Values outside the function: ",mylist
Here, we are maintaining reference of the passed object and appending values in
the same object. So, this would produce the following result −
sum(10,20);
Print”Outside the function global total: ", total
Introduction
Goals of AI
MACHINE LEARNING
Machine learning is a field of computer science that gives computers the ability
to learn without being explicitly programmed.
Machine learning tasks are typically classified into two broad categories,
depending on whether there is a learning "signal" or "feedback" available to a
learning system:-
SUPERVISED LEARNING
UNSUPERVISED LEARNING
NUMPY ARRAY
NumPy’s main object is the homogeneous multidimensional array. It is a
table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers. In NumPy dimensions are called axes. The number of axes is
rank.
For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3. In the example pictured
below, the array has rank 2 (it is 2-dimensional). The first dimension (axis)
has a length of 2, the second dimension has a length of 3.
SCIKIT-LEARN
PANDAS
Data Collection
Data Formatting
Model Selection
Training
Testing
Training: The data set was divided such that x_train is used to train
the model with corresponding x_test values and some y_train kept
reserved for testing.
Testing: The model was tested with y_train and stored in y_predict.
Both y_train and y_predict was compared.
MOVIE RECOMMENDER SYSTEM
INTRODUCTION
PROBLEM STATEMENT
ADVANTAGES –
1) Easy recommendations make less searches and sometime end up un good deals
1) If the system recommends products with bias, then customer will be landing
into wrong deals
2) Chances are that some websites may suggest products wrongly based on
analysis of little information gathered
FUTURE SCOPE –
""26 TheDarli:.Kflgh!RIses F tt,e deatll of [("Ii:I"' 28, "name": "Action'), [de COllllCS, cmie lighter. IChrtS-1I BaJe, Michael Caine. [Chmto
Oi:;l..nct A\IQmer H.trve ("Hf' 80, "nam tem;,,-1:;t.et, G¥)' Ql,;fn,..-,] N anJ
'""
JOMCa.-ttf
In [26]: movies.head()
OUt(l6]:
,.,
. ..
k1ywords
mov llo _ ld
, , '"" [ln,lhe,22nd cemury,,a
parilP'fllJit Marin
l('d" 28,"Nme":"Actloo') [ctJl!tllcl h furure. pace [SamWonti .mtJ
5.lldilr\il S WeilYer]
].lamo<
-
("Kf' 12, "nam Will sp;a.:;e tolon Cameron}
■
Ptra16ollhe
C;rribbea,i AtWQr11J'-..
Eod
...
In [28]: ra,vies.head()
OUt[lS]:
movll-_ld
"'"
AViltal
Piratesollhe
(Jooon,Oew. Orlilr>doBioofn.
285 Cari1beiln:AtWIJOd's
Eno l<NaKntghlley)
...,,.
,,_,
The0ar1!KnghtR1ses
""'" (A,cr)'ptlc rnes3a{le from
BondsI.send
[[,{, ",1.d.",.,.2.!!,
",nsl m
[5py. OOsedcmOO¥el
$eC1eta .mi6.
(dc:c.oma,e:r1meighter
lell'Ori5t,:secreiide11
[Dame1Cra111,ctw1stophWa!tz,
Lta$eyd0i.o:I
-
In [3e]: ies.he.)d()
0 G)loci!lho o • G 0GE@
-;, JUpyter movie-recommender-system L:,stChed:po111 05!30!2022 (unsavedchaJ19ElS) ;a ,_.
File Edit v- Insert ecn Komel WJdgots Help Nol Trusted I Python 3 (Ip emel) O
...
■ C ,. COOo'!.........................&ll
[bllsedoooovetmars, [John,Carceris,il
[TaylorKilsch L)'llllColhns,
"''" Jam Carter medalhon, spacrtrlVl!l,
p
S11marittli1M(lf\Qrl] tAnd!rwStantoiiJ war--weary tonne,
""·
In [ell]: new_df "110vies[['110vie_id', 'title', 't.igs']]
I
see the c.a11eats in the doculllle(ltation: https: / /pandas .pydata, org/paoclas-docs/stable/user_guicte/indexing,hta.l#returoing-a-\lie1o1-
ve rsus-.a-copy
new_df('tags'J,._df['tags'].apply(lanibda x:• •.join(x))
In [33]: new_df
■
out[33]:
movi,_id 1it/1
19995 lnthe22ndcentuf}',aparaplegict.brineisdi
285 Pnte1 of the Ci1ribbe1m At Wor\d'1,End Cilptim Barb05$il longibelk!Yed 10 be dead
48(l!!
"'' 7'2766 NewfywM, A newlyweo owl,e hQr,eymr,;,on IJlle by th
4807 1261!16 SMll!lMi Calling Whef1 ambitious New 'r'OOI ancmey Sam is sent L
,43()11 2591 M)' Dille with o- Ever 5inc.e the 1-eOlOci r.-iltle when he 11,1 saYf
s.ee the ca11eats in the docuMeOtation: https: / /pandas ,pydata. org/paodas-docs/stable/user_guide/iodexing,ht l#returoiog-a-11ie101-
Ve rsus-a-copy
new_df['tags'] .. new_df('tags'].;ipply(lanibda :ex.lower(})
In [35]: new_df.head()
out(3 ]:
movle-_id
■ -
199ff5 Avatar 1nlhe22ndu,r,tUf)' apa1aplegicl'Mlille d1
285 ' ..
P Type here lo search
Q ::li l!I m (9 I CJ w- = :we Mostly clear A
I
IJ= Q Cl (M;,8- /2 22
0 G)loci!lho o • G 0GE@
-;, JUpyter movie-recommender-system L:,stChed:po111 05!30!2022 (unsavedchaJ19ElS) ;a ,_.
File Edit v- Insert ecn Komel WJdgots Help Nol Trusted IPython 3 (1p emel) O
...,.
49026 TheDmKnlghtR
Johnc..ter
Folkll'llflglhedeattlolD1,-tnctAttome Karve
"367
"''
4&lll 71761:l
BManac:t.,ustwilnlsto hr..rptarand
No;-wfo(weo'J$ A oo;-wlywed ccupl,;-'s hrmeymoon r,; t.tpendl:d by 1h
I
s.ee the caveats in the docuMeOtation: 11ttps:/ /pandas .py<lata,org/pandas-docs/stab1e/user-_guide/indexing,hul.treturoiog-a-vie1o1-ve
rsus-a-copy
new_df['tags'] .. new_dff'tags'].apply(l.-bda ,ex.lower())
In r35]: new_df.head()
out(3S]:
■
2 2066'7 Spectre .a ayptJc mesgge lrom bond's pasl se-nds him o.
File Edit VteW Insert Cell Kernel Widgets Het:, NolTMited IPython 3 (lp\lkemel) 0
2El5 Pntes af tl'le Car1bbean.Al Wand's End captain tiarbossa, long bellew:d to be ck-ad ha
..
·...,.
2 2066'7 Spectre :a cryplJc -s:all<! frombood'i pasl :;end
himo.
The O<uk K,rnght R I.he IJe<'!h QI I ;,tlomey
..
TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r«omrnencier y em x + - :, X
File Edit VteW Insert Cell Kernel Widgets Het:, IPython 3 (1p\lkemel) O
In [42]: cv.gE-t_fE-iltl.lre_names()
out[-<12]: ['000',
·001·, ■
'Hf,
'lOO',
11·,
:i:::
'14',
·1s·,
'16' J
·17·,
'17th',
'lQ' J
'l&th',
'1&thcenturi',
:i::0,,
'1920',
'1930',
In (4S]: si11ilarity
Out[45]: array([[l. , 0. , 0.03184649, ... , 0.e247S369, e.
•. I,
(0. 1. 0, •,,I 0,02592379, 0,
■
J J
0.0277137 ],
(0.0 184£49, 0. J 1, • • •J 0,02680281, 0,
P Type here lo search 0. ],
c : 2475369, 0.02592379, 0.02680281, ...• 1. , 0.0412393 ,
IJ M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r«omrnencier y em x +
I•
Cell Kernel
, o.e:1.nn1 , o.
Widgets Het:,
. . , , o.044543'>4, 0.ess1n34,
NolTMited
..
I Python 3 (lp\lkemel) 0
• I])
1.
In [47]: recorwnend('Avat.ir')
-
5: 559,
,;: .38757,
■
7: 99861,
8: 767,
C 0 lo.:alhoo;I
In [s1]: new_df.to_dict{}
11•1),
■ p e here to search
- 0
BIBLIOGRAPHY
https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
https://medium.com/web-mining-is688-spring-2021/content-
based-movie-recommendation-system-72f122641eab
https://www.techtarget.com/searchenterpriseai/definition/machi
ne-learning- ML#:~:text=Machine%20learning%20(ML)%20is
%20a,to%20p redict%20new%20output%20values.
CONCLUSION