NLP Mini Project
NLP Mini Project
NLP Mini Project
ENGINEERING,
NAVI MUMBAI-400706
PRESENTS
MINI PROJECT
ON
CO1:
Students will have a broad understanding of field of natural language
processing.
CO2:
Student will have a sense of capabilities and limitations of current natural
language technologies.
CO3:
Students will be able to model linguistic phenomena with formal grammars.
CO4:
Students will be able to design, implement and test algorithms of NLP
problems.
CO5:
Students will be able to understand the mathematical and linguistic foundations
underlying approaches to the various areas in NLP.
CO6:
The students be able to apply NLP techniques to design real world NLP
applications such as machine translation, text summarization etc.
BHARTHI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI 400 706
PROJECT REPORT
ON
“AUTOMATED RESUME SCREENING SYSTEM
USING NATURAL LANGUAGE PROCESSING AND
SIMILARITY”
PROJECT MEMBERS
1 Bhagyashree Pathak 48
2 Sayali Patil 49
3 Mayuri Pawar 52
4 Rutuja Pawar 53
LAB - INCHARGE
Prof. Dr. D.R.Ingle
TABLE OF CONTENT
2 System Working 6
3 System Approach 10
4 System Implementation 11
5 Acknowledgement 18
6 Reference 19
CHAPTER 1
INTRODUCTION
Hiring the right talent is a challenge for all businesses. This challenge is
magnified bythe high volume of applicants if the business is labour-intensive,
growing, and facing high attrition rates.
Typically, large companies do not have enough time to open each CV, so
they usemachine learning algorithms for the Resume Screening task.
CHAPTER 2
SYSTEM WORKING
Software Requiremtents
1) Python:
Python is used for creating backbone structure. Python is intended to be a
highlyreadable language. It is designed to have an uncluttered visual layout,
it uses whitespace indentation, rather than curly braces or keywords. Python
has a largestandard library, commonly cited as one of Python's greatest
strengths.
Simplicity
To provide an intuitive framework along with substantial
building blocks, giving users a practical knowledge of NLP without
getting boggeddown in the tedious house-keeping usually associated
with processing annotated language data .
Consistency
To provide a uniform framework with consistent interfaces and
data structures, and easily guessable method names.
Extensibility
To provide a structure into which new software modules can be
easilyaccommodated, including alternative implementations and
competing approaches to the same task.
Modularity
To provide components that can be used independently without
needing to 9 understand the rest of the toolkit.A significant fraction of
any NLP syllabus deals with algorithms and data structures. On their
own thesecan be rather dry, but NLTK brings them to life with the
help of interactivegraphical user interfaces that make it possible to
view algorithms step-by- step. Most NLTK components include a
demonstration that performs an interesting task without requiring any
special input from the user. An effective way to deliver the materials
is through interactive presentation of the examples in this book,
entering them in a Python session, observing what they do, and
modifying them to explore some empirical or theoreticalissue.
4) Machine Learning tool :Scikit-learn (Python Package)
It is a Python module integrating classic machine learning algorithms
in thetightly-knit scientific Python world (numpy, scipy, matplotlib). It aims
to provide simple and efficient solutions to learning problems, accessible to
everybody and reusable in various contexts: machine-learning as a versatile
tool for science and engineering.
In general, a learning problem considers a set of n samples of data and
try topredict properties of unknown data. If each sample is more than a
single number, and for instance a multidimensional entry (aka multivariate
data), is it said to haveseveral attributes, or features.
We can separate learning problems in a few large categories:
• Supervised learning , in which the data comes with additional
attributesthat we want to predict.
This problem can be either:
–classification: samples belong to two or more classes and we want to
learnfrom already labeled data how to predict the class of unlabeled data. An
example of classification problem would be the digit recognition example, in
which the aimis to assign each input vector to one of a finite number of
discrete categories.
– regression: if the desired output consists of one or more continuous
variables, then the task is called regression. An example of a regression
problemwould be the prediction of the length of a salmon as a function of its
age and weight.
• Unsupervised learning , in which the training data consists of a set of
input vectors x without any corresponding target values. The goal in
such problems maybe to discover groups of similar examples within
the data, where it is called clustering, or to determine the distribution
of data within the input space, known asdensity estimation, or to
project the data from a high-dimensional space down to two or thee
dimensions for the purpose of visualization.
5) Elasticsearch DSL
It is a high-level library whose aim is to help with writing and running
queries against Elasticsearch. It is built on top of the official low-level client
(elasticsearch-py). It provides a more convenient and idiomatic way to write
and manipulate queries. It stays close to the Elasticsearch JSON DSL,
mirroring its terminology and structure. It exposes the whole range of the
DSL from Python either directly using defined classes or a queryset-like
expressions. It also providesan optional wrapper for working with
documents as Python objects: defining mappings, retrieving and saving
documents, wrapping the document data in user-
defined classes. To use the other Elasticsearch APIs (eg. cluster health) just
use theunderlying client.
Hardware Requiremtents
SYSTEM APPROACH
One of the keys to Python’s explosive growth has been its densely populated
collection of extension software libraries, known in Python’s terminology as
packages, supplied and maintained by Python’s extensive user community.
Each package extends the functionality of the base Python language and core
packages, and in addition to functions and data must include documentation
and examples, often in the form of vignettes demonstrating the use of the
package. The best-known package repository, the NLTK, currently has over
10,000 packages that are published.
Recent efforts among the Python text analysis developers’ community are
designed to promote this interoperability to maximize flexibility and choice
among users.4 As a result, learning the basics for text analysis in Python
provides access toa wide range of advanced text analysis features.
CHAPTER 4
SYSTEM IMPLEMENTATION
Code for Resume Screening
Output:
Useful information extracted:
Analysis of resume
Pie Chart
Word Cloud
CHAPTER 5
ACKNOWLEDGEMENT
REFERENCES
▪ Python programming
▪ http://www.Google.co.in/
▪ https://www.Python-project.org/
▪ http://yann.lecun.com/exdb/mnist/