NLP Mini Project

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

BHARTHI VIDYAPEETH COLLEGE OF

ENGINEERING,
NAVI MUMBAI-400706

DEPARTMENT OF COMPUTER ENGINEERING

Natural Language Processing

PRESENTS

MINI PROJECT

ON

“AUTOMATED RESUME SCREENING SYSTEM


USING NATURAL LANGUAGE PROCESSING
AND SIMILARITY”
(2020-2021)
COURSE OUTCOMES

CO1:
Students will have a broad understanding of field of natural language
processing.

CO2:
Student will have a sense of capabilities and limitations of current natural
language technologies.

CO3:
Students will be able to model linguistic phenomena with formal grammars.

CO4:
Students will be able to design, implement and test algorithms of NLP
problems.

CO5:
Students will be able to understand the mathematical and linguistic foundations
underlying approaches to the various areas in NLP.

CO6:
The students be able to apply NLP techniques to design real world NLP
applications such as machine translation, text summarization etc.
BHARTHI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI 400 706

DEPARTMENT OF COMPUTER ENGINEERING


(2020-2021)

PROJECT REPORT
ON
“AUTOMATED RESUME SCREENING SYSTEM
USING NATURAL LANGUAGE PROCESSING AND
SIMILARITY”

PROJECT MEMBERS

SR NO NAME Roll NO MARKS

1 Bhagyashree Pathak 48

2 Sayali Patil 49
3 Mayuri Pawar 52
4 Rutuja Pawar 53

LAB - INCHARGE
Prof. Dr. D.R.Ingle
TABLE OF CONTENT

SR. CHAPTER PAGE


NO. NO.
1 Introduction 5

2 System Working 6

3 System Approach 10

4 System Implementation 11

5 Acknowledgement 18

6 Reference 19
CHAPTER 1

INTRODUCTION

Hiring the right talent is a challenge for all businesses. This challenge is
magnified bythe high volume of applicants if the business is labour-intensive,
growing, and facing high attrition rates.

An example of such a business is that IT departments are short of growing


markets. In a typical service organization, professionals with a variety of technical skills
and business domain expertise are hired and assigned to projects to resolve customer
issues. This task of selecting the best talent among many others is known as Resume
Screening.

Typically, large companies do not have enough time to open each CV, so
they usemachine learning algorithms for the Resume Screening task.
CHAPTER 2

SYSTEM WORKING

Software Requiremtents

The Software requirements in this project include:


• Python
• nltk
• ML
• Elasticsearch

1) Python:
Python is used for creating backbone structure. Python is intended to be a
highlyreadable language. It is designed to have an uncluttered visual layout,
it uses whitespace indentation, rather than curly braces or keywords. Python
has a largestandard library, commonly cited as one of Python's greatest
strengths.

2) WebCrawlers: Scrapy (Python Package)


Scrapy is an application framework for crawling web sites and extracting
structured data which can be used for a wide range of useful applications,
like datamining, information processing or historical archival. Even though
Scrapy was originally designed for web scraping, it can also be used to
extract data using APIs(such as Amazon Associates Web Services) or as a
general purpose web crawler. Scrapy is controlled through the scrapy
command-line tool, to be referred here as the “Scrapy tool” to differentiate it
from the sub-commands, which we just call
“commands” or “Scrapy commands”. The Scrapy tool provides several
commands, for multiple purposes, and each one accepts a different set of
arguments and options.

3) Natural Language Processing Tool :Natural Language Toolkit (NLTK)


(Python Package)
NLTK was originally created in 2001 as part of a computational linguistics
coursein the Department of Computer and Information Science at the
University of Pennsylvania. Since then it has been developed and expanded
with the help of dozens of contributors. It has now been adopted in courses
in dozens of universities, and serves as the basis of many research projects.
NLTK was designed with four primary goals in mind:

Simplicity
To provide an intuitive framework along with substantial
building blocks, giving users a practical knowledge of NLP without
getting boggeddown in the tedious house-keeping usually associated
with processing annotated language data .

Consistency
To provide a uniform framework with consistent interfaces and
data structures, and easily guessable method names.

Extensibility
To provide a structure into which new software modules can be
easilyaccommodated, including alternative implementations and
competing approaches to the same task.

Modularity
To provide components that can be used independently without
needing to 9 understand the rest of the toolkit.A significant fraction of
any NLP syllabus deals with algorithms and data structures. On their
own thesecan be rather dry, but NLTK brings them to life with the
help of interactivegraphical user interfaces that make it possible to
view algorithms step-by- step. Most NLTK components include a
demonstration that performs an interesting task without requiring any
special input from the user. An effective way to deliver the materials
is through interactive presentation of the examples in this book,
entering them in a Python session, observing what they do, and
modifying them to explore some empirical or theoreticalissue.
4) Machine Learning tool :Scikit-learn (Python Package)
It is a Python module integrating classic machine learning algorithms
in thetightly-knit scientific Python world (numpy, scipy, matplotlib). It aims
to provide simple and efficient solutions to learning problems, accessible to
everybody and reusable in various contexts: machine-learning as a versatile
tool for science and engineering.
In general, a learning problem considers a set of n samples of data and
try topredict properties of unknown data. If each sample is more than a
single number, and for instance a multidimensional entry (aka multivariate
data), is it said to haveseveral attributes, or features.
We can separate learning problems in a few large categories:
• Supervised learning , in which the data comes with additional
attributesthat we want to predict.
This problem can be either:
–classification: samples belong to two or more classes and we want to
learnfrom already labeled data how to predict the class of unlabeled data. An
example of classification problem would be the digit recognition example, in
which the aimis to assign each input vector to one of a finite number of
discrete categories.
– regression: if the desired output consists of one or more continuous
variables, then the task is called regression. An example of a regression
problemwould be the prediction of the length of a salmon as a function of its
age and weight.
• Unsupervised learning , in which the training data consists of a set of
input vectors x without any corresponding target values. The goal in
such problems maybe to discover groups of similar examples within
the data, where it is called clustering, or to determine the distribution
of data within the input space, known asdensity estimation, or to
project the data from a high-dimensional space down to two or thee
dimensions for the purpose of visualization.

5) Elasticsearch DSL
It is a high-level library whose aim is to help with writing and running
queries against Elasticsearch. It is built on top of the official low-level client
(elasticsearch-py). It provides a more convenient and idiomatic way to write
and manipulate queries. It stays close to the Elasticsearch JSON DSL,
mirroring its terminology and structure. It exposes the whole range of the
DSL from Python either directly using defined classes or a queryset-like
expressions. It also providesan optional wrapper for working with
documents as Python objects: defining mappings, retrieving and saving
documents, wrapping the document data in user-
defined classes. To use the other Elasticsearch APIs (eg. cluster health) just
use theunderlying client.

Hardware Requiremtents

Linux: GNOME or KDE desktop GNU C Library (glibc) 2.15 or later, 2 GB


RAM minimum, 4 GB RAM recommended, 1280 x 800 minimum screen
resolution.

Windows: Microsoft R Windows R 8/7/Vista (32 or 64-bit) 2 GB RAM


minimum, 4 GB RAM recommended, 1280 x 800 minimum screen
resolution,Intel R processor with support for Intel R VT-x, Intel R EM64T
(Intel R 64) Execute Disable (XD) Bit functionality.

1) Supportive Operating Systems :


The supported Operating Systems for client include:
• Windows xp onwards
• Linux any flavour.
Windows and Linux are two of the operating systems that will support
comparative website.
Since Linux is an open source operating system, This system which is
will usein this project is developed on the Linux platform but is made
compatible withwindows too.The comparative website will be tested on
both Linux and windows. The supported Operating Systems for server
include: The supportedOperating Systems For server include Linux.
Linux is used as server operatingsystem. For web server we are using
apache 2.0
CHAPTER 3

SYSTEM APPROACH

Python Programming language


Python was specifically designed for statistical analysis, which makes it highly
suitable for data science applications. Although the learning curve for
programming with Python can be steep, especially for people without prior
programming experience, the tools now available for carrying out text analysis
in Python make it easy to perform powerful, cutting-edge text analytics using
only a few simple commands.

One of the keys to Python’s explosive growth has been its densely populated
collection of extension software libraries, known in Python’s terminology as
packages, supplied and maintained by Python’s extensive user community.
Each package extends the functionality of the base Python language and core
packages, and in addition to functions and data must include documentation
and examples, often in the form of vignettes demonstrating the use of the
package. The best-known package repository, the NLTK, currently has over
10,000 packages that are published.

Text analysis in particular has become well established in Python. There is a


vast collection of dedicated text processing and text analysis packages, from
low- level string operations to advanced text modeling techniques such as
fitting Latent Dirichlet Allocation models, Python provides it all. One of the
main advantages of performing text analysis in Python is that it is often
possible, and relatively easy, to switch between different packages or to
combine them.

Recent efforts among the Python text analysis developers’ community are
designed to promote this interoperability to maximize flexibility and choice
among users.4 As a result, learning the basics for text analysis in Python
provides access toa wide range of advanced text analysis features.
CHAPTER 4

SYSTEM IMPLEMENTATION
Code for Resume Screening
Output:
Useful information extracted:
Analysis of resume
Pie Chart
Word Cloud
CHAPTER 5

ACKNOWLEDGEMENT

I would like to express my special thanks of gratitude to our


subject in charge as well as our HOD Prof. Dayanand Ingle who gave me the
golden opportunity to do this project on the topic “AUTOMATED RESUME
SCREENING SYSTEM USING NATURAL LANGUAGE PROCESSING
AND SIMILARITY” which also helped me in doing a lot of research and I
came to knowabout so many new things. I am really thankful to him.
CHAPTER 6

REFERENCES

▪ Python programming
▪ http://www.Google.co.in/
▪ https://www.Python-project.org/
▪ http://yann.lecun.com/exdb/mnist/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy