A Novel Entity Resolution For Entity Matching Using Mapreduce

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

A NOVEL ENTITY RESOLUTION FOR ENTITY

MATCHING USING MAPREDUCE


A project report submitted in

The partial fulfillment of the requirements for the award of the degree of

Bachelor of Technology
in
Computer Science & Engineering
Submitted by

K.H.Anil Kumar (14KD1A0568)


K.Surya Haritha (14KD1A0569)
P.Avinash (14KD1A05A3)
M.Babu Nikhil (14KD1A0590)
Under the guidance of
Mr. U.Kartheek Chandra Patnaik, M.Tech
Associate Professor
Department of CSE

Department of Computer Science & Engineering


LENDI INSTITUTE OF ENGINEERING & TECHNOLOGY
Accredited by NAAC with “A” Grade
Affiliated to JNTUK, Approved by A.I.C.T.E,
Jonnada, Vizianagaram Dist.535005.
2015-2019
LENDI INSTITUTE OF ENGINEERING &TECHNOLOGY
JONNADA: VIZIANAGARAM -535005.
Accredited by NAAC with “A” Grade
Department of Computer Science and Engineering

BONAFIDE CERTIFICATE
This is to certify that the project entitled “A Novel Entity Resolution For Entity
Matching Using Mapreduce” is a bonafide record of the work done by K.H.Anil Kumar
(14KD1A0568), K.Surya Haritha (14KD1A0569),P.Avinash(14KD1A05A3), M.Babu Nikhil
(14KD1A0590) under the supervision and guidance of Mr. U. Kartheek Chandra Patnaik,
Associate Professor in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science and Engineering from Lendi Institute Of
Engineering And Technology(Affiliated to JNTUK), Jonnada, Vizianagaram for the year 2018.

Internal guide Head of the Department

Mr. U.Kartheek Chandra Patnaik, M.Tech, Prof. A.Rama Rao, M. Tech, (PhD)
Associate Professor Professor
Department of C.S.E Department of CSE

External Examiner
ACKNOWLEDGEMENT

With great solemnity and sincerity, we offer our profuse thanks to our
management, for providing all the resources to complete our project successfully. We express
our deepest sense of gratitude and pay our sincere thanks to our guide Mr. U.Kartheek
Chandra Patnaik,Associate Professor,Department of C.S.E,who evinced keen interest in our
efforts and provided his valuable guidance throughout our project work.

We thank our project coordinators Dr. R. Rajender, Mr. U.Kartheek Chandra Patnaik
and Mr.V.Hima Sankar, who has made his support available in a number of ways and helped us
to complete our project work in correct manner.

We thank our Prof. A.RAMA RAO , Head of the Department of Computer Science &
Engineering who helped us to complete our project work in a truthful method.

We thank our gratitude to our principal Dr. V.V.RAMA REDDY, for his kind attention
and valuable guidance to us throughout this course in carrying out the project.

We wish to express gratitude toour Management Memebers who supported us in


providing good lab facility.

We also thankful to All Staff Members Of Department of Computer Science &


Engineering, for helping us to complete this project work by giving valuable suggestions.

All of the above we great fully acknowledge and express our thanks to our parents who
have been instrumental for the success of this project which play a vital role.

K.H.Anil Kumar (14KD1A0568)


K.SuryaHaritha (14KD1A0569)
P.Avinash (14KD1A05A3)
M.Babu Nikhil (14KD1A0590)
DECLARATION

We hereby declare that the project work entitled “A Novel Entity Resolution for

Entity Matching using MapReduce“ submitted to the JNTU Kakinada is a record of an original

work done by K.H.Anil Kumar (14KD1A0568),K.Surya Haritha (14KD1A0569), P.Avinash

(14KD1A05A3),M.Babu Nikhil (14KD1A0590)under the esteemed guidance of Mr.

U.Kartheek Chandra Patnaik,Associate Professor, Computer science & Engineering, Lendi

Institute of Engineering & Technology. This project work is submitted in the partial fulfillment

of the requirements for the award of the degree Bachelor of Technology in Computer Science

& Engineering. This entire project is done with the best of our knowledge and is not submitted

to any university for the award of degree/diploma.

K.H.Anil Kumar (14KD1A0568)


K.SuryaHaritha (14KD1A0569)
P.Avinash (14KD1A05A3)
M.Babu Nikhil (14KD1A0590)
LENDI INSTITUTE OF ENGINEERING &TECHNOLOGY
JONNADA: VIZIANAGARAM -535005.
Accredited by NAAC with “A” Grade
Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VISION

To be a frontier in computing technologies to produce globally competent computer


science engineering graduates with moral values to build a vibrant society and nation.

MISSION

 Providing a strong theoretical and practical background in computer science engineering


with an emphasis on software development.

 Inculcating professional behavior, strong ethical values, innovative research capabilities,


and leadership abilities.

 Imparting the technical skills necessary for continued learning towards their professional
growth and contribution to society and rural communities.

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

PEO-1: Graduates will have strong knowledge and skills to comprehend latest tools and
techniques of Computer Engineering so that they can analyze, design and create
computing products and solutions for real life problems.
PEO-2:Graduates shall have multidisciplinary approach, professional attitude and ethics,
communication and teamwork skills, and an ability to relate and solve social issues
through their Employment, Higher Studies and Research.
PEO-3: Graduates will engage in life-long learning and professional development to
adapt to rapidly changing technology.

PROGRAM SPECIFIC OUTCOMES (PSOs)

PSO-1: Ability to grasp advanced programming techniques to solve contemporary issues.


PSO-2: Have knowledge and expertise to analyze data and networks using latest tools
and technologies.
PSO-3: Qualify in national and international competitive examinations for successful
higher studies and employment.

PROGRAM OUTCOMES (POS)


PO-1 Engineering Knowledge: Apply the knowledge of mathematics, science,
engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.

PO-2 Problem Analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles
of mathematics, natural sciences, and engineering sciences.

PO-3 Design/development of Solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations

PO-4 Conduct Investigations of Complex Problems: Use research-based knowledge and


research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
PO-5 Modern Tool Usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

PO-6 The Engineer and Society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.

PO-7 Environment and Sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.

PO-8 Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO-9 Individual and Team Work: Function effectively as an individual, and as a


member or leader in diverse teams, and in multidisciplinary settings.

PO-10 Communication: Communicate effectively on complex engineering activities with


the engineering community and with society at large, such as, being able to comprehend
and write effective reports and design documentation, make effective presentations, and
give and receive clear instructions.

PO-11 Project Management and Finance: Demonstrate knowledge and understanding of


the engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary environments.

PO-12 Life-Long Learning: Recognize the need for, and have the preparation and ability
to engage in independent and life-long learning in the broadest context of technological
change.
ABSTRACT

Accurate and efficient entity resolution is an open challenge of particular relevance to


intelligence organizations that collect large datasets from disparate sources with differing levels
of quality and standard. Starting from a first-principles formulation of entity resolution, this
project presents a novel Entity Resolution algorithm that introduces a data-driven blocking and
record linkage technique based on the identification of entity signatures in data. The scalability
and accuracy of the proposed algorithm are evaluated using benchmark datasets and shown to
achieve state-of-art results. The proposed algorithm can be implemented simply on modern
parallel databases, which allows it to be deployed with relative ease in large industrial
application.

Keywords- Entity Resolution, blocking, record linkage, entity signatures.

Outcomes:
Our project titled “A Novel Entity Resolution for Entity Matching using Map Reduce” is mapped
with the following outcomes:

Program Outcomes : PO1, PO2, PO3, PO4, PO5, PO6, PO7, PO8,

PO9, PO10, PO11, PO12

Program Specific Outcomes : PSO1, PSO2, PSO3.


LIST OF CONTENTS

S No Title POs and PSOs Mapped Page No


INTRODUCTION 1
1.1 Project Overview 2
1
1.2 Project Deliverables PO1, PO4,PO7,PO9 2
1.3 Project Scope 3
PO1,PO4,PO8,PO9,
2 LITERATURE SURVEY 4
PO7
PROBLEM ANALYSIS 8
3.1 Existing System 8
PO1,PO2,PO3,PO4
3 3.1.1 Challenges 8
PO9
3.2 Proposed System 9
3.2.1 Advantages 9
SYSTEM ANALYSIS 10
4.1 System Requirement Specification 11
4.1.1 Functional Requirements 11
4.1.2 Non - Functional Requirements 12
PO1,PO2,PO3,
4.2 Feasibility Study 13
4 PO3,PO4,PO6,
4.3 Use Case Scenarios 14
PO9,PO10
4.3.1 Use case Diagrams 16
4.4 System Requirements 16
4.4.1 Software Requirements 16
4.4.2 Hardware Requirements 17
SYSTEM DESIGN 18
5.1 Introduction 18
PO3,PO5,PO9,PO11
5 5.1.1 Class Diagram 23
PSO1,PSO2,PSO3
5.1.2 Deployment Diagram 25
5.2 System Architecture 26
5.2.1 Algorithm Description 26

IMPLEMENTATION 30
6.1 Technology Description 30
PO1,PO2,PO4,PO5,PO9
6.1.1 Ubuntu 30
6 PO12
6.1.2 Java 32
PSO1,PSO2,PSO3
6.1.3 Apache Hadoop 34
6.2 Sample Source Code 37
TESTING PO1,PO2,PO4,PO5, 44
7 7.1 Introduction PO9,PO12 44
7.2 Test Cases PSO1,PSO2,PSO3 51
8 SAMPLE SCREEN SHOTS PO5 52
9 CONCLUSION PO1,PO4 59
10 BIBLOGRAPHY PO8,PO10 60
LIST OF FIGURES

Fig. No Figure Caption Page No

4.1 Use case diagram 16


5.1 Class Diagram for Map Reduce 22
5.2 Deployment Diagram 25
5.3 System Architecture 25
6.1 Hadoop Architecture 35
8.1 Starting Hadoop 52

8.2 Hadoop HDFS Directory 52


8.3 Input File 53
8.4 Execution of Jar File NER1 53
8.5 Output Log1 54
8.6 Output File1 54
8.7 Execution of Jar File NER2 55
8.8 Output Log2 55
8.9 Output File 2 56
8.10 Execution of Jar File NER3 56
8.11 Output Log 3 57
8.12 Output Log 4 57
8.13 Output Log 5 58
8.14 Final Output 58
LIST OF TABLES

Table. Table Caption Page


No No

4.1 Graphical Representation of Use Case Diagram 15


5.1 Graphical Representation of Sequence Diagram 23
7.1 Representing Test Cases and Status 50
Project Book Rules:

1) Use 16-pt, Times New Roman, Bold, Justified for Major


Heading, 14-pt for Sub heading and 12-pt for second sub
heading.
Eg. 1. Main Heading
1.1 Sub Heading

1.1.1 Second Sub Heading

2) Page numbering for the project book should be in the middle


of the page and it should start from Chapter-1 i.e from
Introduction till Last Chapter.

3) Figure numbers and Table Captions should be numbered


according to Chapter wise. Figure Number should be written
below the figure and Table Caption should be written above
the Table. 11- pt Times New Roman, Bold Justified, italic font
for both table and figure numbering.
Eg. Fig. 2.1 System Design

Table 4.5 Inference Rules


4) The Bibliography should be IEEE format. 12-pt, times New
Roman, Justified.
Eg.

[1] A. Mukherjee, S. A. A. Fakoorian, J. Huang, and A. L. Swindlehurst,

‘‘Principles of physical layer security in multiuser wireless networks: A survey,’’


IEEE Commun. Surveys Tuts., vol. 16, no. 3, pp. 1550–1573, Aug. 2014.

5) Every Project book should contain a DVD attached in the last


page of book containing the following:
a) Batch, year, Project Title on DVD
b) Single Page Abstract
c) Base Papers
d) All Software’s related to Project.
e) Documentation and PPT related to project.
f) Publication Related to Project.
g) Project Source Code

6) Total project book bindings are: Project Batch Size + 3

7) Book Binding page Borders: (Depending upon your book)


Left : 2, Top-Right-Left : 1

8) Project Book Cover Page :


Black Color with Gold Lettering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy