0% found this document useful (0 votes)
9 views43 pages

Movie Recommender System - Mushkan Keshri

Uploaded by

Sayan Debnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views43 pages

Movie Recommender System - Mushkan Keshri

Uploaded by

Sayan Debnath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

MOVIE RECOMMENDER SYSTEM

PROJECT REPORT
In partial fulfilment of the requirements for the award of the degree
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Under the guidance of

Sourav Goswami

(Note: All entries of the proforma of approval should be filled up with appropriate and
complete information. Incomplete proforma of approval in any respect will be summarily
rejected.)
 Title of the Project – Sentiment Analysis
 Project Members – Rudraneel Paul
-- Sayan Debnath
--Ayandeep Dutta
--Simanta Saha
--Vinayak Sharma
 Guide Name – Mr Sourav Goswami

Project Version Control History –


Version Primary Author Description of Version Date Completed

Final Rudraneel Paul Project Report 28TH FEB,2024

Signature of Candidates-

Signature of Approver –
TH
Date: 2 8 FEB, 2024

For Office Use Only -


Approved Not Approved
DECLARATION

We hereby declare that the project work being presented in the project
proposal entitled “SENTIMENT ANALYSIS” in partial
fulfilment of the requirements for the award of the degree
of BACHELOR OF TECHNOLOGY at SILIGURI INSTITUTE OF
TECHNOLOGY, is an authentic work carried out under the
guidance of MR. SOURAV GOSWAMI. The matter embodied in this
project work has not been submitted elsewhere for the award of any
degree of our knowledge and belief.

Date: 28TH FEB, 2024


CERTIFICATE

This is to certify that this proposal of minor project entitled


“SENTIMENT ANALYSIS” is a record of Bonafide work, carried
out under my guidance at SILIGURI INSTITUTE OF
TECHNOLOGY. In my opinion, the report in its present form is in
partial fulfilment of the requirements for the award of the degree of
BACHELOR OF TECHNOLOGY and as per regulations. To the
best of my knowledge, the results embodied in this report, are original
in nature and worthy of incorporation in the present version of the
report

Guide / Supervisor

MR. Sourav Goswami(PROJECT ENGINEER)


ACKNOWLEDGEMENT

Success of any project depends largely on the encouragement and


guidelines of many others. I take this sincere opportunity to express
my gratitude to the people who have been instrumental in the successful
completion of this project work.

I would like to show our greatest appreciation to Mr. Sourav


Goswami Project Engineer. I always feel motivated and
encouraged every time by his valuable advice and constant
inspiration; without his encouragement and guidance this project
would not have materialized.

Words are inadequate in offering our thanks to the other trainees,


project assistants and other members at Ardent Computech Pvt. Ltd.
for their encouragement and cooperation in carrying out this project
work. The guidance and support received from all the members and who
are contributing to this project, was vital for the success of this
project.
CONTENTS

 Overview
 History of Python
 Environment Setup
 Basic Syntax
 Variable Types
 Functions
 Modules
 Packages
 Artificial Intelligence
o Machine Learning
o Natural Language Processing
 Machine Learning
o Supervised and Unsupervised Learning
o NumPy
o Scikit-learn
o Pandas
 BIG MART SALES PREDICTION
1. Introduction
2. Problem Statement
3. Advantages & Disadvantages
4. Future Scope
OVERVIEW

Python is a high-level, interpreted, interactive and object-


oriented scripting language. Python is designed to be highly
readable. It uses English keywords frequently where as other
languages use punctuation, and has fewer syntactical
constructions than other languages.

Python is interpreted: Python is processed at runtime by the


interpreter. You do not need to compile your program before executing
it. This is similar to Perl and PHP.

Python is Interactive: You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

Python is Object-Oriented: Python supports Object-Oriented style


or technique of programming that encapsulates code within objects.

Python is a Beginner's Language: Python is a great language for


the beginner-level programmers and supports the development of a
wide range of applications from simple text processing to WWW
browsers to games.
HISTORY OF PYTHON

Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands. Python is derived from
many other languages, including ABC, Modula-3, C, C++, Algol-
68, Small Talk, UNIX shell, and other scripting languages. Python
is copyrighted. Like Perl, Python source code is now available under
the GNU General Public License (GPL). Python is now maintained
by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
FEATURES OF PYTHON

Easy-to-learn: Python has few Keywords, simple structure and clearly defined syntax. This
allows a student to pick up the language quickly.

Easy-to-Read: Python code is more clearly defined and visible to the eyes.
Easy -to-Maintain: Python's source code is fairly easy-to-maintain.

A broad standard library: Python's bulk of the library is very portable and cross platform
compatible on UNIX, Windows, and Macintosh.

Interactive Mode: Python has support for an interactive mode which allows interactive testing
and debugging of snippets of code.

Portable: Python can run on the wide variety of hardware platforms and has the same
interface on all platforms.

Extendable: You can add low level modules to the python interpreter. These modules enables
programmers to add to or customize their tools to be more efficient.

Databases: Python provides interfaces to all major commercial databases.


GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries, and windows systems, such as Windows MFC, Macintosh,
and the X Window system of Unix.

Scalable: Python provides a better structure and support for large programs than shell
scripting.

Apart from the above-mentioned features, Python has a big list of good features, few
are listed below:
 It support functional and structured programming methods as well as OOP.
 It can be used as a scripting language or can be compiled to byte code for
building large applications.
 It provides very high level dynamic datatypes and supports dynamic type checking.
 It supports automatic garbage collections.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA and JAVA.
ENVIRONMENT SETUP

Open a terminal window and type "python" to find out if it is already


installed and which version is installed.

 UNIX (Solaris, Linux, FreeBSD, AIX, HP/UX, SunOS,


IRIX, etc.)

 Win 9x/NT/2000

 Macintosh (Intel, PPC, 68K)

 OS/2

 DOS (multiple versions)

 PalmOS

 Nokia mobile phones

 Windows CE

 Acorn/RISC OS
BASIC SYNTAX OF PYTHON PROGRAM

Type the following text at the Python prompt and press the Enter –

>>> print "Hello, Python!"

If you are running new version of Python, then you would need to use print
statement with parenthesis as in print ("Hello, Python!");.
However in Python version 2.4.3, this produces the following result –

Hello, Python!
Python Identifiers
A Python identifier is a name used to identify a variable, function, class,
module or other object. An identifier starts with a letter A to Z or a to z or an
underscore (_) followed by zero or more letters, underscores and digits (0 to 9).
Python does not allow punctuation characters such as @, $, and % within
identifiers. Python is a case sensitive programming language.

Python Keywords

The following list shows the Python keywords. These are reserved words and you
cannot use them as constant or variable or any other identifier names. All the
Python keywords contain lowercase letters only.

And, exec, not


Assert, finally, or
Break, for, pass
Class, from, print
continue, global, raise
def, if, return
del, import, try
elif, in, while
else, is, with
except, lambda, yield

Lines & Indentation

Python provides no braces to indicate blocks of code for class and


function definitions or flow control. Blocks of code are denoted by line
indentation, which is rigidly enforced.
The number of spaces in the indentation is variable, but all statements within
the block must be indented the same amount. For example –
if True:
print "True"
else:
print "False"

Command Line Arguments

Many programs can be run to provide you with some basic information about
how they should be run. Python enables you to do this with -h −

$ python-h
usage: python [option]...[-c cmd|-m mod | file |-][arg]...
Options and arguments (and corresponding environment variables):

-c cmd: program passed in as string(terminates option list)

-d : debug output from parser (also PYTHONDEBUG=x)

-E : ignore environment variables (such as PYTHONPATH)

-h : print this help message and exit [ etc.]


VARIABLE TYPES

Variables are nothing but reserved memory locations to store values. This means
that when you create a variable you reserve some space in memory.

Assigning Values to Variables

Python variables do not need explicit declaration to reserve memory space. The
declaration happens automatically when you assign a value to a variable. The
equal sign (=) is used to assign values to variables.

counter=10 # An integer assignment


weight=10.60 # A floating point
name="Ardent" # A string

Multiple Assignment

Python allows you to assign a single value to several variables simultaneously.


For example −
a=b=c=1
a,b,c = 1,2,"hello"

Standard Data Types

The data stored in memory can be of many types. For example, a person's age is
stored as a numeric value and his or her address is stored as alphanumeric
characters. Python has five standard data types −
String
List
Tuple
Dictionary
Number
Data Type Conversion

Sometimes, you may need to perform conversions between the built-in types.
To convert between types, you simply use the type name as a function.

There are several built-in functions to perform conversion from one data type to
another.
Defining a Function

 def function name( parameters ):


"function_docstring"
function suite
return [expression]

Pass by reference vs Pass by value

All parameters (arguments) in the Python language are passed by reference. It


means if you change what a parameter refers to within a function, the change
also reflects back in the calling function. For example –

# Function definition is here

def change me(mylist):


"This changes a passed list into this function"
mylist.append([1,2,3,4]);
print"Values inside the function: ",mylist
return

# Now you can call changeme function

mylist=[10,20,30];
change me(mylist);
print” Values outside the function: ",mylist

Here, we are maintaining reference of the passed object and appending values in
the same object. So, this would produce the following result −

Values inside the function: [10, 20, 30, [1, 2, 3, 4]]


Values outside the function: [10, 20, 30, [1, 2, 3, 4]]

Global vs. Local variables


Variables that are defined inside a function body have a local scope, and those
defined outside have a global scope . For Example-

total=0; # This is global variable.

# Function definition is here

def sum( arg1, arg2 ):

# Add both the parameters and return them."

total= arg1 + arg2; # Here total is local variable.


print"Inside the function local total: ", total
return total;

# Now you can call sum function

sum(10,20);
Print”Outside the function global total: ", total

When the above code is executed, it produces the following result −

Inside the function local total: 30


Outside the function global total: 0
MODULES

A module allows you to logically organize your Python code.


Grouping related code into a module makes the code easier to
understand and use. A module is a Python object with arbitrarily
named attributes that you can bind and reference.

The Python code for a module named a name normally resides in a


file named aname.py. Here's an example of a simple module,
support.py

def print_func( par ):


print"Hello : ", par
return

The import Statement

You can use any Python source file as a module by executing an


import statement in some other Python source file. The import has the
following syntax –

Import module1 [, module2 [… moduleN]]


PACKAGES

A package is a hierarchical file directory structure that defines a


single Python application environment that consists of modules and
sub packages and sub-subpackages, and so on.

Consider a file Pots.py available in Phone directory. This file has


following line of source code −
def Pots ():
print "I'm Pots Phone"

Similar way, we have another two files having different functions


with the same name as above –

 Phone/Isdn.py file having function Isdn ()


 Phone/G3.py file having function G3 ()

Now, create one more file init .py in Phone directory −

 Phone/ init .py

To make all of your functions available when you've imported Phone,


you need to put explicit import statements in init .py as follows
− from Pots import Pots
from Isdn import Isdn
from G3 import
ARTIFICIAL INTELLIGENCE

Introduction

According to the father of Artificial Intelligence, John McCarthy, it is


“The science and engineering of making intelligent machines,
especially intelligent computer programs”.

Artificial Intelligence is a way of making a computer, a computer-


controlled robot, or a software think intelligently, in the similar
manner the intelligent humans think.

AI is accomplished by studying how human brain thinks, and how


humans learn, decide, and work while trying to solve a problem, and
then using the outcomes of this study as a basis of developing
intelligent software and systems.

The development of AI started with the intention of creating similar


intelligence in machines that we find and regard high in humans.

Goals of AI

To Create Expert Systems − The systems which exhibit intelligent


behaviour, learn, demonstrate, explain, and advice its users.
To Implement Human Intelligence in Machines − Creating systems
that understand, think, learn, and behave like humans.
Applications of AI

AI has been dominant in various fields such as:-

Gaming − AI plays crucial role in strategic games such as


chess, poker, tic-tac-toe, etc., where machine can think of large
number of possible positions based on heuristic knowledge.

Natural Language Processing − It is possible to interact with the


computer that understands natural language spoken by humans.

Expert Systems − There are some applications which integrate


machine, software, and special information to impart reasoning and
advising. They provide explanation and advice to the users.

Vision Systems − These systems understand, interpret, and


comprehend visual input on the computer.

For example: A spying aeroplane takes photographs, which are used to


figure out spatial information
Or map of the areas.

Doctors use clinical expert system to diagnose the patient.

Police use computer software that can recognize the face of


criminal with the stored
portrait made by forensic artist.

Speech Recognition − Some intelligent systems are capable of


hearing and comprehending the language in terms of sentences and
their meanings while a human talks to it. It can handle different
accents, slang words, noise in the background, change in human’s
noise due to cold, etc.

Handwriting Recognition − The handwriting recognition software


reads the text written on paper by a pen or on screen by a stylus. It
can recognize the shapes of the letters and convert it into editable text.

Intelligent Robots − Robots are able to perform the tasks given by a


human. They have sensors to detect physical data from the real world
such as light, heat, temperature, movement, sound, bump, and
pressure. They have efficient processors, multiple sensors and huge
memory, to exhibit intelligence. In addition, they are capable of
learning from their mistakes and they can adapt to the new
environment.

MACHINE LEARNING

Machine learning is a field of computer science that gives computers


the ability to learn without being explicitly programmed.
Evolved from the study of pattern recognition and computational
learning theory in artificial intelligence, machine learning explores
the study and construction of algorithms that can learn from and
make predictions on data.

NATURAL LANGUAGE PROCESSING

Natural language processing (NLP) is a subfield


linguistics, computer science, and artificial intelligence concerned
with the interactions between computers and human language, in
particular how to program computers to process and analyze large
amounts of natural language data. The goal is a computer capable of
"understanding" the contents of documents, including the contextual
nuances of the language within them. The technology can then
accurately extract information and insights contained in the
documents as well as categorize and organize the documents
themselves.
INTRODUCTION TO MACHINE LEARNING

Machine learning is a field of computer science that gives computers the ability
to learn without being explicitly programmed.

Arthur Samuel, an American pioneer in the field of computer gaming and


artificial intelligence, coined the term "Machine Learning" in 1959 while
at IBM. Evolved from the study of pattern recognition and computational
learning theory in artificial intelligence, machine learning explores the study
and construction of algorithms that can learn from and make predictions on
data

Machine learning tasks are typically classified into two broad categories,
depending on whether there is a learning "signal" or "feedback" available to a
learning system:-

SUPERVISED LEARNING

Supervised learning is the machine learning task of inferring a function from


labelled training data.[1] The training data consist of a set of training
examples. In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value.

A supervised learning algorithm analyses the training data and produces an


inferred function, which can be used for mapping new examples. An optimal
scenario will allow for the algorithm to correctly determine the class labels for
unseen instances. This requires the learning algorithm to generalize from the
training data to unseen situations in a "reasonable" way.

UNSUPERVISED LEARNING

Unsupervised learning is the machine learning task of inferring a function to


describe hidden structure from "unlabelled" data (a classification or
categorization is not included in the observations). Since the examples given to
the learner are unlabelled, there is no evaluation of the accuracy of the structure
that is output by the relevant algorithm—which is one way of distinguishing
unsupervised learning from supervised learning and reinforcement learning.

A central case of unsupervised learning is the problem of density estimation in


statistics, though unsupervised learning encompasses many other problems
(and solutions) involving summarizing and explaining key features of the
data.
NUMPY

NumPy is a library for the Python programming language, adding support


for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays. The
ancestor of NumPy, Numeric, was originally created by Jim Hugunin.

NumPy targets the CPython reference implementation of Python, which is


a non-optimizing bytecode interpreter. Mathematical algorithms written for
this version of Python often run much slower than compiled equivalents.

Using NumPy in Python gives functionality comparable to MATLAB since


they are both interpreted, and they both allow the user to write fast programs as
long as most operations work on arrays or matrices instead of scalars.

NUMPY ARRAY
NumPy’s main object is the homogeneous multidimensional array. It is a
table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers. In NumPy dimensions are called axes. The number of axes is
rank.
For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3. In the example pictured
below, the array has rank 2 (it is 2-dimensional). The first dimension (axis)
has a length of 2, the second dimension has a length of 3.

[[1., 0., 0.],


[ 0., 1., 2.]]
NumPy’s array class is called ndarray. It is also known by the alias.

SLICING NUMPY ARRAY


Import numpy as np
a = np.array ([[1, 2, 3],[3,4,5],[4,5,6]])
print 'Our array is:'
Print a
print '\n'

print 'The items in the second column are:'


print a[...,1]
print '\n'

print 'The items in the second row


are:' print a[1...]
print '\n'

print 'The items columns 1 onwards are:'


print a [...,1:]
OUTPUT

Our array is:


[[1 2 3]
[3 4 5]
[4 5 6]]

The items in the second column


are: [2 4 5]

The items in the second row are:


[3 4 5]

The items column 1 onwards are:


[[2 3]
[4 5]
[5 6]]

SCIKIT-LEARN

Scikit-learn is a free software machine learning library for the Python


programming language. It features various classification, regression and
clustering algorithms including support vector machines, random forests,
gradient boosting, k-means and DBSCAN, and is designed to interoperate with
the Python numerical and scientific libraries NumPy and SciPy.

The scikit-learn project started as scikits.learn, a Google Summer of Code


project by David Cournapeau. Its name stems from the notion that it is a
"SciKit" (SciPy Toolkit), a separately-developed and distributed third-party
extension to SciPy.[4] The original codebase was later rewritten by other
developers. In 2010 Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort
and Vincent Michel, all from INRIA took leadership of the project and made the
first public release on February the 1st 2010[5]. Of the various scikits, scikit-
learn as well as scikit-image were described as "well-maintained and popular" in
November 2012.

PANDAS

In computer programming, pandas is a software library written for the


Python programming language for data manipulation and analysis. In
particular, it offers data structures and operations for manipulating
numerical tables and time series. It is free software released under the three-
clause BSD license. "Panel data", an econometrics term for multidimensional,
structured data sets.
LIBRARY FEATURES
 Data Frame object for data manipulation with integrated indexing.
 Tools for reading and writing data between in-memory data structures
and different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of data sets.
 Label-based slicing, fancy indexing, and sub setting of large data sets.
 Data structure column insertion and deletion.
 Group by engine allowing split-apply-combine operations on data sets.
 Data set merging and joining.
 Hierarchical axis indexing to work with high-dimensional data in a
lower-dimensional data structure.
 Time series-functionality: Date range generation.
ALGORITHM

 Data Collection
 Data Formatting
 Model Selection
 Training
 Testing

Data Collection: We have collected data sets of weather from online


website. We have downloaded the .csv files in which information was
present.

Data Formatting: The collected data is formatted into suitable data


sets. We check the collinearity with mean temperature. The data
sets which have collinearity nearer to 1.0 has been selected.

Model Selection: We have selected different models to minimize the


error of the predicted value. The different models used are Linear
Regression Linear Model, Ridge Linear model, Lasso Linear Model
and Bayesian Ridge Linear Model.

Training: The data set was divided such that x_train is used to train
the model with corresponding x_test values and some y_train kept
reserved for testing.

Testing: The model was tested with y_train and stored in y_predict.
Both y_train and y_predict was compared.
MOVIE RECOMMENDER SYSTEM

INTRODUCTION

Sentiment analysis, also known as opinion mining, is a Natural


Language Processing (NLP) technique used to determine the
sentiment or emotional tone conveyed in a piece of text. It involves
analyzing the subjective information expressed by users to
understand their opinions, attitudes, and emotions towards a
particular topic, product, service, or event

PROBLEM STATEMENT

For building a sentiment analyzer from scratch, we face several different


problems. Currently there are a lot of analysisng systems based on the user
information, so what should we do if the website has not gotten enough users.
After that, we will solve the representation of a sentiments, which is how a
system can understand a sentiments. That is the precondition for comparing
similarity between emotions. But for each feature of the emotion, there should
be different weight for them and each of them plays a different role for
recommendation. So we get these questions:
 How to determine emotions when there are no user information.
 What kind of sentiment can be used for the recommender system.
 How to calculate the similarity between two sentimen.
 Is it possible to set weight for each features.

ADVANTAGES –

1) Easy recommendations make less searches and sometime end up un good deals

2) It allows companies to efficiently analyze large volumes of


customer feedback from diverse sources like social media.

3) It provides valuable insights into market trends, competitor


analysis, and consumer sentiment
DISADVANTAGES –

1) If the system recommends products with bias, then customer will be landing
into wrong deals

2) Chances are that some websites may suggest products wrongly based on
analysis of little information gathered

FUTURE SCOPE –

Integrating multiple modalities such as text, images, and


audio to capture richer sentiment representations and
improve the accuracy of sentiment analysis models. movie is
exceptionally either high or low.As an improvement on this project
some other methods such as adjusted cosine similarity can be used to
compute similarity.
Code -
TI M.1,llorw-l(w111iqP10,,:-n'>/M< x movier£'00mmencier y em x + - :, X

0 0 ""''"'"' 0 • (l ,c, (a, !in


f

■ P Type here lo search


or i ln ast.lite-ral_eval(obj):
if i['job'J == 'Director':

TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r£'00mrnencier y em x + - :, X

0 0 ""''""" 0 • (l ,c, (a, !in

""26 TheDarli:.Kflgh!RIses F tt,e deatll of [("Ii:I"' 28, "name": "Action'), [de COllllCS, cmie lighter. IChrtS-1I BaJe, Michael Caine. [Chmto
Oi:;l..nct A\IQmer H.trve ("Hf' 80, "nam tem;,,-1:;t.et, G¥)' Ql,;fn,..-,] N anJ

Jol'll"ICN!N'1 awat-WttM'f, [("id"'28_"twne"""Actton"), l onno,,elmar (T lorKiucnL,nn


IAnclMw JQfmerm,lrtaryc;,a fii;I"' 12."oam me,d,aR1Q11.',,pi!CeO"avel Samanttii1Mcirt01'1] Sl.alltcn)

'""
JOMCa.-ttf

In [25]: rllDvies.['overview'] = 110vies['overview').<1pply(la11bda x:x.split())

In [26]: movies.head()
OUt(l6]:
,.,
. ..
k1ywords
mov llo _ ld
, , '"" [ln,lhe,22nd cemury,,a
parilP'fllJit Marin
l('d" 28,"Nme":"Actloo') [ctJl!tllcl h furure. pace [SamWonti .mtJ
5.lldilr\il S WeilYer]
].lamo<

-
("Kf' 12, "nam Will sp;a.:;e tolon Cameron}


Ptra16ollhe
C;rribbea,i AtWQr11J'-..
Eod

P Type here lo search


TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r£'00mrnencier y em x + - :, X

0 0 ""''"'"' 0 • (l ,c, (a, !in


:Z JUpyter movie-recommender-system L:,stChockpoint 05!30/2022 (unsavedch:,119(1S) ;a
File Edit VteW Insert Cell Kernel Widgo Help No!TMited IPython 3 {Ip emel) O

{A, c1yp1JC ssage from


BoniJ's,pa5t.:iefld

(FOIOwlng, ,tle31h.Of lfo<J" 28,"nar'ne" "AdlOl1j ldtromotS 01n'l!'light . [Chrrsh,ViBake M, [Chns.tclpher'


49026 The-DarkKl'lightRi5e:s
!M.!Jict. Attorney ("kl" 90, "nam tenOffll,s.ecreti Gaf)'OlaJMlll Nolan)

[John, CilWI Wilr-


i$ ,ii If ie,"name"·"Aaioti1 [bi.m on novel, ,. ("Tioyl-,r Kit$d;h, Lynn Col115
49'29 Jolmc.rtH We.MY, !ormef mili ("d" 12, "rom medalllOn, space travel Samarrtna Mortoo] I" " ' " "
S ta nt o n )

In [27]: rw,vie-s('ge-nre-s'J= movies['ge-nres').apply(lambda x:(i.replaceC ·,"")for i 1n x])


r1110Vi.es('keywords'J "M011ies['keywords'),.apply(la.ibda x:(i.replaceC ·, ·-) for 1 in x])
rnovies('cast'J "mo11ies['cast').apply(liJ111bda x.:(i.replace(" •, n) feri inl())
ra,vies['crew'J "a::i11ies['crew').apply(la11bda x:(i.rephce(" •, n) fer i in l(I)

...
In [28]: ra,vies.head()
OUt[lS]:
movll-_ld

[In, !he, 22nd c.enwry,. a [[,I,", 1, d" , 2, e lculturectw.,lurure ISamv.'crthIrll)lOn,ZoeSaldalia


pal'ilp/eg1t Mari11 ",n ,ii m $p W¥ t, Sig0Ul'T1eyWei1verJ

"'"
AViltal
Piratesollhe
(Jooon,Oew. Orlilr>doBioofn.
285 Cari1beiln:AtWIJOd's
Eno l<NaKntghlley)

...,,.
,,_,
The0ar1!KnghtR1ses
""'" (A,cr)'ptlc rnes3a{le from
BondsI.send
[[,{, ",1.d.",.,.2.!!,
",nsl m
[5py. OOsedcmOO¥el
$eC1eta .mi6.
(dc:c.oma,e:r1meighter
lell'Ori5t,:secreiide11
[Dame1Cra111,ctw1stophWa!tz,
Lta$eyd0i.o:I

.,,,. [Jotr,, Carter I! a war- IL{,·, 1, d " ., 2,


8
JbasedoMovel, fflllf'I
mooillion,:,pacetral'ei,p
tTartoiKrtsd1 L nnColins
S.imaritti.iMwwn)
-"""" weal)'•. iormef.mil1 .. ".11.a.m.

rn [.!9]: ies('tags') "MO\lies['O\ler\liew") + IIIO\liesj'genres·] + IIOVies('keywords'] + MO\lies['c.ast·J + MO\liesL'crew·J

-
In [3e]: ies.he.)d()

■ P Type here lo search


0Utf30]:

TJ MdLh.,.,. Lf'dr11,r1qp ('!f'I [•,/Mt ,; - rnov,e ft'.'t:OmmendN em X T - :, X

0 G)loci!lho o • G 0GE@
-;, JUpyter movie-recommender-system L:,stChed:po111 05!30!2022 (unsavedchaJ19ElS) ;a ,_.
File Edit v- Insert ecn Komel WJdgots Help Nol Trusted I Python 3 (Ip emel) O

+ i,.. l'll .,. •


movi.Jd
·,- Ruri

...
■ C ,. COOo'!.........................&ll

ov,rvi.w g1nr11 ktyword5


n IM,22ncl H{."1,d",, [rultt..recla!.h,ltl!i.tt, [SamWonhJngtoo llntM,22nd

"'" Zot!Salciilnil tt!ntUl)'.,il


cenll.S)'il, 2,8,,,".nil, spaoewar spac.t!OJlont. (JamesC.ilrneror11
parapleglc,Marn. m,... Sq::,umeyWeaver) pa1apk-glt;Mar111..

285 fToltesoflJ>e [Capt.i'll.8arbo:;sa H {.".t.d,"... [oceari,d . (Johr111)'0epo (Cilp(ilm B.lroo5 .


C .AI iono, l1tvl!'dltl l,2, ,·n,a, o!);OIJC!SlanO, Ollandoeloom (Goo-'fflllln!-kiJ IOl'lg ,o!.:!, to,
\IVQctlJ':5,Erid d m,. east1ooialrad Ke,oK.11,ghlJer] d

[ACJwllC.Mf: . U{,•,,.d",, {!",j)y . , ,311,1 IA, etypllC,


2 206647 Spectre ffom Bond's, past, 2. 8, ... •. n. a, r.ecrefagen s,equt,f, mi6, C Waltt, [SamMende5J from Bood' . pas.t.
..nd m Ll!ii!ISeydow:] -d

[bllsedoooovetmars, [John,Carceris,il
[TaylorKilsch L)'llllColhns,
"''" Jam Carter medalhon, spacrtrlVl!l,
p
S11marittli1M(lf\Qrl] tAnd!rwStantoiiJ war--weary tonne,

""·
In [ell]: new_df "110vies[['110vie_id', 'title', 't.igs']]

In l32.]: new_df•[ t gs • J • new_df[ 't.ags• J,.ipply( lMbda


<ipython• input• :32· ad44e9ca1347>:1: Settinglofi thCopyWarning:
A value is trying to be :!ioet on a copy of a slice fro11 a DiiltaFr.arne.
Try usillg .loc:[row_indexer,.col_indexer] • value instead
1,::• •• join(x))

I
see the c.a11eats in the doculllle(ltation: https: / /pandas .pydata, org/paoclas-docs/stable/user_guicte/indexing,hta.l#returoing-a-\lie1o1-
ve rsus-.a-copy
new_df('tags'J,._df['tags'].apply(lanibda x:• •.join(x))

In [33]: new_df


out[33]:
movi,_id 1it/1

\9995 In the 22nd centul)', a paraplegic.Mame isdi

265 Piro' olth C '!Jbe AtWMd' Eod Capta Barbo I .. believedlObedead ha

P Type here lo search


0 ::1 _!!! II (9 I " (JI W = 34"C MostlyclE'a, rs r_:,_, a cJ /; li
TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r£'00mrnencier y em x + - :, X

0 0 ""''"'"' 0 • (l ,c, (a, !in


:Z JUpyter movie-recommender-system L:,stChockpoint 05!30/2022 (unsavedch:,119(1S) ;a
File Edit VteW Insert Cell Kernel Widgo Help No!TMited IPython 3 {1p emel) O

19995 lnthe22ndcentuf}',aparaplegict.brineisdi
285 Pnte1 of the Ci1ribbe1m At Wor\d'1,End Cilptim Barb05$il longibelk!Yed 10 be dead

' 206&\7 Spedre Aaypt,cmessage !tom Bon!fs pasts-ends h1mo


49026 The Daf11Knight Rises Fcllomng the deillll of DSJJ::I Attome Have

John Carter JchnCoi11"1erisawa,-wury,fcrmermiilaryt.il

.,_ El Mallaehl,ust Wilnls to play hrs gutar and

48(l!!
"'' 7'2766 NewfywM, A newlyweo owl,e hQr,eymr,;,on IJlle by th

48()6 231617 :SMu,,,1, D@IM!/M -S,gnM,.UM @red' inrtOl'l a ,1 ,C

4807 1261!16 SMll!lMi Calling Whef1 ambitious New 'r'OOI ancmey Sam is sent L

,43()11 2591 M)' Dille with o- Ever 5inc.e the 1-eOlOci r.-iltle when he 11,1 saYf

4806 rows x 3 COiumns

In [34]: new_df("t.:,gs') "new_df['tags').apply(lMbda 1':x.lower-())

<ipython •input .34. 8b60b591a07f;,:1: SettingWi thCopyWarning:


A value i'ii trying to be set Ofl a -copy of a dice fro11 .ii PcltaFr-.ime.
Try using .locrrow_inde:,cer,col_indexer] ■ value instead

s.ee the ca11eats in the docuMeOtation: https: / /pandas ,pydata. org/paodas-docs/stable/user_guide/iodexing,ht l#returoiog-a-11ie101-
Ve rsus-a-copy
new_df['tags'] .. new_df('tags'].;ipply(lanibda :ex.lower(})

In [35]: new_df.head()
out(3 ]:
movle-_id

■ -
199ff5 Avatar 1nlhe22ndu,r,tUf)' apa1aplegicl'Mlille d1

285 ' ..
P Type here lo search
Q ::li l!I m (9 I CJ w- = :we Mostly clear A
I
IJ= Q Cl (M;,8- /2 22

TJ Mdlr,.,.,. Lf"dr11,r1qp ('!f'I [•,/Mt ,; - rnov,e ft'.'t:OmmendN em X T - :, X

0 G)loci!lho o • G 0GE@
-;, JUpyter movie-recommender-system L:,stChed:po111 05!30!2022 (unsavedchaJ19ElS) ;a ,_.
File Edit v- Insert ecn Komel WJdgots Help Nol Trusted IPython 3 (1p emel) O

+ j;. l'll .,. ... .,. Run ■ C ,. COOo'!.........................&ll

2 2066" Spectre Acrypt,cmessitgellomBond'spntsend1hlmo ..

...,.
49026 TheDmKnlghtR

Johnc..ter
Folkll'llflglhedeattlolD1,-tnctAttome Karve

Jorwl 1$.,.....,. •...,:,.,,')',IQ<merm111t;,ryc.;;i

"367
"''
4&lll 71761:l
BManac:t.,ustwilnlsto hr..rptarand
No;-wfo(weo'J$ A oo;-wlywed ccupl,;-'s hrmeymoon r,; t.tpendl:d by 1h

48()6 231617 . SMIM, "Signed, Sea 06-/ered" Introduces a


4807 126186 ShqMi C11!linp Whef1 3mbit10us New yen,;Marney Sam is sent,
,43()11 25975 M):D;11ewt1hOre,,, EverU'ICl!1heM!ccndgradol!!whef1he rstsaw

4806 rows x 3 c:olumns

tn [34]: new_df('tags')= nt?W_df['tags'].apply(b11bd11 x:x.)ower())

<ipyt.hon-input• 34·8b68b591a07f>:1: SettingWithCopyWarnlng:


A value is trying to b@ sErt on a copy of a slice frO!I a DataFraine.
Try using .Joc[row_indexertcol_indexer] ■ value inste.:,(I

I
s.ee the caveats in the docuMeOtation: 11ttps:/ /pandas .py<lata,org/pandas-docs/stab1e/user-_guide/indexing,hul.treturoiog-a-vie1o1-ve
rsus-a-copy
new_df['tags'] .. new_dff'tags'].apply(l.-bda ,ex.lower())

In r35]: new_df.head()
out(3S]:

199El5 Avatar in lhe 22nd renu.-y, a pa1aplegicmanoe di


28-5 P11111es of the Caribbe.vi·/A) Wand's End captarl tiar!Jouil, long beJie,i«I to be cleiid ha


2 2066'7 Spectre .a ayptJc mesgge lrom bond's pasl se-nds him o.

P Type here lo search


..
TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r«omrnencier y em x + - :, X

0 0 ""''"'"' 0 • (l ,c, (a, G,


:Z Jupyter movie-recommender-system L"stChockpoin1 05!30/2022 (ull$avedch ll,;JeS)

File Edit VteW Insert Cell Kernel Widgets Het:, NolTMited IPython 3 (lp\lkemel) 0

2El5 Pntes af tl'le Car1bbean.Al Wand's End captain tiarbossa, long bellew:d to be ck-ad ha
..

·...,.
2 2066'7 Spectre :a cryplJc -s:all<! frombood'i pasl :;end
himo.
The O<uk K,rnght R I.he IJe<'!h QI I ;,tlomey

h.lrve Jonn Cao'tM JOl'ln cartff !!I a war-wu,-,, fom-w

In LJ6]: Ulp,Ort nltk

In [37]: fr--=-i nltk.ste..porter .import POrterstemer


ps= PorterStenner()

In [38]: def stem(text):


ra II
for i in text.split():
y .append(ps. stem(i))
return • ··, join(y)

In f]9]: new_df(·tags·) " noew_df(·tags'] .apply(ster11)


< ipython - input- 39- bel8a434!idB9>:l: settingWithCopyWaming:
A \lalue is trying to be set on a copy of a slice fror11a DataFrame,
Try using . Ioc[row_indexer,col_indexer] " value instead

e the ca\leato. in the docu111Mtation: https: / /pandas .pydata. or-g/pandas-docs/o;table/user_guide/indexing.htrll11returning-a-\lie.,,-


ve rsus-a copy
new_df[·tag '] • new_df[ 'tags'], ilpply(stem)

In [<la]: fr011 sklearn.feature_extraction.text :ta,por-t Countvectorizer


cv "'co1.mtvectorizer(a.i•_feiltures"" :xtee, stop_HOrds"' 'english')

In [41]: VE'Ctors • C\l,fit_transfor■(new_df('tags']).toarray()

Type here lo search

..
TI M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r«omrnencier y em x + - :, X

0 0 ""''""" 0 • (l ,c, (a, @


:Z JUpyter movie-recommender-system L-'stChockpoint 05!30/2022 (unsavedch ll,;JeS)

File Edit VteW Insert Cell Kernel Widgets Het:, IPython 3 (1p\lkemel) O

In [42]: cv.gE-t_fE-iltl.lre_names()
out[-<12]: ['000',
·001·, ■
'Hf,
'lOO',
11·,

:i:::
'14',
·1s·,
'16' J
·17·,
'17th',
'lQ' J

'l&th',
'1&thcenturi',

:i::0,,
'1920',
'1930',

In [43]: fr,oa, skleilrn.aetrics.pairwise lfflport cosinE-_si■ilarity

In [44]: 5l■ilarity" cosine_5i■ilarity(11ectors)

In (4S]: si11ilarity
Out[45]: array([[l. , 0. , 0.03184649, ... , 0.e247S369, e.
•. I,
(0. 1. 0, •,,I 0,02592379, 0,


J J

0.0277137 ],
(0.0 184£49, 0. J 1, • • •J 0,02680281, 0,
P Type here lo search 0. ],
c : 2475369, 0.02592379, 0.02680281, ...• 1. , 0.0412393 ,
IJ M.1,llorw-l(w111iqP10,,:-n'>/M< x mov r«omrnencier y em x +

0 0 ""''"'"' 0 • (l ,c, (a, G,


:Z Jupyter movie-recommender-system L"stChockpoin1 OS!30/2022 (ull$avedch ll,;JeS) - :, X

File Edit VteW Insert

I•
Cell Kernel

, o.e:1.nn1 , o.
Widgets Het:,

. . , , o.044543'>4, 0.ess1n34,
NolTMited

..
I Python 3 (lp\lkemel) 0

• I])
1.

In [46]: def recorl'llllt"nd{.avie):


raovie_index = ne111_df(new_df[ 'title'] == 110Vie] .index[0)
dis.ranees = siinilarity{t,r,vie_index)
raovies_list = sorted(list(enuaerate(distances)), reverse= True, k.ey= llllhd11 x:x[1J)[1:6]
for i in ai::r.des list:
print(ne111_df ,iloc [i[el], title)

In [47]: recorwnend('Avat.ir')

Aliens vs Predator: Requiem


Aliens
s.::,ttle: Los Angeles
Independence oay
Falcon Rising

In [48]: iJllport pickle

In [49]: pickle.dufllP(new_df, open( 'rovies .pkl·, 'wb'))

In [50]: pick.Ie.dump(si11ilarity, open('si■ilarity.pkl', 'wb'))

In [51]: new_df. to_dict()

Out(51]: {'movie_id': {0: 19995,


1: 285, ■
2: 2%6<17,
3: 49026,
4: 49529,

-
5: 559,
,;: .38757,


7: 99861,
8: 767,

P Type here lo search

MMf"n••li..JrnorqPr,Ject /M, )< ;;; "1()>11e1erommendc>rsystc>m X ...._ - 7 X

C 0 lo.:alhoo;I

;:'. JLipyter movie-recommender-system L11s1Checkpoinl 05'30/2022 (unsaved Changes)


,. (l • 0 {: I@ fj;

File Edit Vtew lnseri Ce11 Kernl Widgets Help


IPvthonJ(ipykemel) 0

In [50]: pick.le,du11p(si111ilarity, open(' si11ilarity.pkl', 'wb' ))

In [s1]: new_df.to_dict{}

out['.:>1]: {'movie_id': {0: 19995,


1: ,:s:,,
2: 206647,

3: 49026,
4: 49529,
5: 559,
6: 38757,
7: 99861,
8: 767,
9: 209112,
1e: 1451,
11: 10764,
12: 58,
lJ.: 572.el,
14: 49521,
15: 2454,
16: 1.4428,
17: 1865,
18: 411S4,

In [52]: pickle.du11p(n1:w_df.to_dict{), open('fllovie_dict..pkl', 'wb'))

11•1),

■ p e here to search
- 0
BIBLIOGRAPHY

 https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata
 https://medium.com/web-mining-is688-spring-2021/content-
based-movie-recommendation-system-72f122641eab
 https://www.techtarget.com/searchenterpriseai/definition/machi
ne-learning- ML#:~:text=Machine%20learning%20(ML)%20is
%20a,to%20p redict%20new%20output%20values.
CONCLUSION

Recommender systems are a powerful new technology for extracting


additional value for a business from its user databases. These
systems help users find items they want to buy from a business.
Recommender systems benefit users by enabling them to find items
they like. Conversely, they help the business by generating more
sales. Recommender systems are rapidly becoming a crucial tool in
E-commerce on the Web. Recommender systems are being stressed by
the huge volume of user data in existing corporate databases, and
will be stressed even more by the increasing volume of user data
available on the Web. New technologies are needed that can
dramatically improve the scalability of recommender systems.

In this Project I presented and experimentally evaluated a new


algorithm for Content based recommender systems. My results show
that item-based techniques hold the promise of allowing content-based
algorithms to scale to large data sets and at the same time produce
high-quality recommendations.
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy