7. Data mining RDBMS Spatial analysis and modelling

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

MGY-006

Indira Gandhi National Open University SPATIAL ANALYSIS


School of Sciences
AND MODELLING
MGY-006
SPATIAL ANALYSIS
Indira Gandhi National Open University
School of Sciences
AND MODELLING

SPATIAL ANALYSIS
BLOCK 1
BASICS OF SPATIAL ANALYSIS 7

BLOCK 2
SPATIAL ANALYSIS 65

BLOCK 3
MODELLING, PROJECT MANAGEMENT AND PROGRAMMING 193
MGY-006: SPATIAL ANALYSIS AND MODELLING
Programme Design Committee
Prof. Sujatha Verma Dr. I. M. Bahuguna Mr. Manish Parmar
Former Director Deputy Director (Rtd.) Scientist
School of Sciences Space Applications Centre Space Applications Centre
IGNOU, New Delhi Indian Space Research Organisation (ISRO) Ahmedabad, Gujarat
Dr. Shailesh Nayak (ISRO), Ahmedabad, Gujarat Dr. Akella V.S. Aswani
Director Prof. Shamita Kumar ESRI India Technologies Pvt.
National Institute of Advanced Studies Institute of Environment Education and Ltd.
Bangaluru, Karnataka Research Hyderabad, Telangana
Dr. P.S. Acharya Bharati Vidyapeeth University Dr. O.M. Murali
Head, NRDMS, NSDI Division Pune, Maharashtra GIS Consultant
Department of Science and Technology Ms. Asima Misra Chennai, Tamil Nadu
Ministry of Science & Technology Associate Director
New Delhi Prof. Manish Trivedi
ES & e-Governance Group School of Sciences
Dr. Debapriya Dutta Centre for Development of Advanced
Scientist ‘G’ & Associate Head IGNOU, New Delhi
Computing (C-DAC)
National Geospatial Programme Ministry of Electronics and Information Dr. Rajesh Kaliraman
Department of Science and Technology Technology (MeitY) School of Sciences
Ministry of Science & Technology Pune, Maharashtra IGNOU, New Delhi
New Delhi Dr. V. Venkat Ramanan
Dr. Sameer Saran
Dr. L.K. Sinha Head School of Inter-Disciplinary and
Former Director Geoinformatics Department Trans-Disciplinary Studies
Defence Terrain Research Lab. (DTRL), Indian Institute of Remote Sensing IGNOU, New Delhi
Delhi & Defence Geoinformatics Dehradun, U.K.
Research Establishment (DGRE) Faculty of Geology Discipline
Defence R&D Organisation (DRDO) Prof. Daljeet Singh School of Sciences, IGNOU
Chandigarh Department of Geography Prof. Meenal Mishra
Prof. P.K. Garg Swami Shraddhanand College Prof. Benidhar Deshmukh
Civil Engineering Department University of Delhi, New Delhi Prof. R. Baskar
IIT Roorkee, Roorkee, U.K. Dr. D.R. Rajak Dr. M. Prashanth
Prof. P.K. Verma Scientist Dr. Kakoli Gogoi
School of Studies in Earth Science Space Applications Centre (ISRO)
Ahmedabad, Gujarat Dr. Omkar Verma
Vikram University, Ujjain, M.P.

Course Design Committee


Prof. Shamita Kumar Dr. Dharmendra G. Shah Dr. Neha Garg
Institute of Environment Education and Department of Botany School of Sciences
Research MS University of Baroda IGNOU, New Delhi
Bharati Vidyapeeth University Vadodara, Gujarat Dr. Rajesh Kaliraman
Pune, Maharashtra Dr. Sadhana Jain School of Sciences
Prof. R. Jaishanker Regional Remote Sensing Centre IGNOU, New Delhi
CV Raman Laboratory of Ecological (RRSC), ISRO Dr. V. Venkat Ramanan
Informatics Nagpur, Maharashtra School of Inter-Disciplinary and
Digital University Kerala Dr. Neeti Trans-Disciplinary Studies
(formerly IIITM-K) Centre for Climate Change and IGNOU, New Delhi
Thiruvananthapuram, Kerala Sustainability, Azim Premji University
Ms. Asima Misra Faculty of Geology Discipline
Bengaluru, Karnataka
ES & e-Governance Group School of Sciences, IGNOU
Prof. P.V.K. Sasidhar
Centre for Development of Advanced School of Extension and Development Prof. Meenal Mishra
Computing (C-DAC), MeitY Studies, IGNOU, New Delhi Prof. Benidhar Deshmukh
Pune, Maharashtra
Prof. Nehal Farooqi Prof. R. Baskar
Dr. Amit Kumar
School of Extension and Development Dr. M. Prashanth
Environmental Technology Division Studies, IGNOU, New Delhi
CSIR-Institute of Himalayan Dr. Kakoli Gogoi
Bioresource Technology Prof. Deepika Dr. Omkar Verma
Palampur, H.P. School of Sciences
IGNOU, New Delhi

Programme Coordinators: Prof. Benidhar Deshmukh and Prof. Meenal Mishra

2
MGY-006: SPATIAL ANALYSIS AND MODELLING
Preparation Team
Course Contributors
Dr. Nikhil Lele (Unit 1) Dr. Shukla Acharjee (Unit 2 and 3) Dr. O. M. Murali (Unit 4)
Space Application Centre, ISRO Centre for Studies in Geography GIS Consultant,
Ahmedabad, Gujrat Dibrugarh University, Assam Chennai, Tamilnadu
Prof. Sarat Pukan (Units 5 and 6) Ms. Swati Grover (Unit 7) Dr. M. Prashanth (Unit 8)
Department of Geological Sciences GIS Specialist School of Sciences
Guahati University, Assam Ghaziabad, NCR Delhi IGNOU, New Delhi
Dr. Sunil L. Londhe (Unit 9) Prof. Kiranmay Sarma (Unit 10) Dr. Maya Kumari (Unit 10)
General Manager Department of Geology Assistant Professor
Deepak Fertilisers and Petrochemicals GGSIPU, Delhi Amity School of Natural Resources
Corportation Limited, Pune Dr. Akella V.S. Aswani (Unit 12) & Sustainable Development,
Amity University, Noida
Prof. Benidhar Deshmukh (Unit 11) ESRI India Technologies Pvt. Ltd. Mr. I Prabu (Unit 13)
School of Sciences Hyderabad, Telangana Joint Director
IGNOU, New Delhi Dr. R. K. Chingkhei (Unit 12) ES & e-Governance Group
Department of Earth Science Centre for Development of
Manipur University, Imphal Advanced Computing (C-DAC),
Manipur MeitY, Pune, Maharashtra
Content Editors
Prof. Sarat Phukan (Block 1) Shri G. Sajeevan (Block 2) Prof. Benidhar Deshmukh (Block 3)
Department of Geological Sciences Emerging Solutions and School of Sciences
Guahati University, Assam e-Governance Group IGNOU, New Delhi
C-DAC, Pune

Course Coordinator: Dr. M. Prashanth and Prof. Benidhar Deshmukh

Transformation and Formatting: Block 1 & 2 – Dr. M. Prashanth


Block 3 – Prof. Benidhar Deshmukh

Programme Coordinators: Prof. Benidhar Deshmukh and Prof. Meenal Mishra


Volume Production
Mr. Rajiv Girdhar Mr. Hemant Kumar
A.R. (P), MPDD, IGNOU S.O. (P), MPDD, IGNOU
Acknowledgement: Ms. Savita Sharma for assistance in the preparation of CRC and some of the figures.
Cover Page Design: Prof. Benidhar Deshmukh
May, 2024
© Indira Gandhi National Open University, 2024
Disclaimer: Any materials adapted from web-based resources in this module are being used for educational purposes
only and not for commercial purposes.
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other means, without
permission in writing from the Indira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from the University’s office
at Maidan Garhi, New Delhi-110 068 or visit University official website http://www.ignou.ac.in.
Printed and published on behalf of Indira Gandhi National Open University, New Delhi by the Registrar,
MPDD, IGNOU.

3
MGY-006: SPATIAL ANALYSIS AND MODELLING
Block 1 : Basics of Spatial Analysis
Unit 1 : Integration of RS and GIS
Unit 2 : Data Mining and Spatial Data Management
Unit 3 : Overview of Geostatistics and Spatial Data Measurements
Block 2 : Spatial Analysis
Unit 4 : Vector Data Analysis
Unit 5 : Raster Data Analysis
Unit 6 : Connectivity Analysis
Unit 7 : Analysis of 3-Dimensional Data
Unit 8 : Viewshed and Watershed Analysis
Unit 9 : Multicriteria Analysis
Block 3 : Modelling, Project Management and Programming
Unit 10 : Modelling Spatial Data
Unit 11 : Dynamic Modelling
Unit 12 : GIS Implementation and Project Management
Unit 13 : Introduction to GIS Programming

4
MGY-006: SPATIAL ANALYSIS AND MODELLING
You have studied the remote sensing and image classification techniques in the course
MGY-105 ‘Techniques in Remote Sensing and Digital Image Processing’. Spatial analysis
and modelling are important components for the functionalities of a GIS. Spatial analysis
permits to use independent sources and accomplish outcomes through a set of spatial tools.
The outcome of the spatial analysis can build geographical data that is more informative
than the unorganised collected data, whereas spatial modelling is an essential process of
spatial analysis. With the application of models or certain set of rules and procedures for
analysing spatial data in a GIS will result in better understanding and representation of
information with enhanced accuracy.
This course comprises 3 blocks. First block introduces the spatial analysis and the second
block comprehensively describes spatial analysis. The remaining block, i.e., third block deals
with GIS modelling, implementation and project management.
Block 1 “Basics of Spatial Analysis” discusses the raster and vector integration, methods
of integration and software and hardware considerations. It also introduces you to spatial
data mining, database modelling and data organisation. Further, it provides an overview of
the geostatistics and data measurement.
Block 2 “Spatial Analysis” introduces vector data analysis along with proximity analysis,
buffering and overlay analysis. It also discusses the basics of raster analysis and
connectivity analysis. In addition, it provides an overview of analysis of 3-dimensional data,
viewshed and watershed analysis and multicriteria analysis.
Block 3 “Modelling, Project Management and Programming” as its name suggests,
deals with spatial modelling of spatial data, GIS design and project management, and GIS
programming. It gives an account of the spatial modelling in general and emphasis on static
modelling in first unit of the block and dynamic modelling in the second unit. GIS design,
implementation and project management are discussed in the next unit. In the next and last
unit of the block, an overview of GIS programming, comparison of various languages and its
scope is provided.
Expected Learning Outcomes
After studying this course, you should be able to:
 discuss integration of remote sensing and GIS;
 describe concepts of data mining and spatial data mining;
 explain fundamentals of geostatistics and spatial data measurements;
 describe vector data analysis, raster data analysis and connectivity analysis;
 discuss analysis of 3-dimensional data, viewshed and watershed and multicriteria
analysis;
 illustrate modelling spatial data and dynamic modelling; and

5
 elucidate GIS design, implementation and project management and basics of GIS
programming.
We wish you all the best and hope you will enjoy reading this course!

6
MGY-006
SPATIAL ANALYSIS
Indira Gandhi National Open University
School of Sciences
AND MODELLING

Block

1
BASICS OF SPATIAL ANALYSIS
UNIT 1
Integration of RS and GIS 11
UNIT 2
Data Mining and Spatial Data Management 25
UNIT 3
Overview of Geostatistics and Spatial
Data Measurements 45

Glossary 63

7
BLOCK 1: BASICS OF SPATIAL ANALYSIS
GIS is found to be one of the simple sources for mapping and analysis used for various
applications. It is recognised to be appropriate for storing and manipulating large volumes of
complex spatially referenced data. It is quite challenging to change the data into meaningful
form and this generally requires simplification of data adequately to make them coherent.
GIS acts as a major tool in spatial data integration. For mapping and analysis spatial data
integration plays a major role. Spatial data integration is the method of combining multiple
spatial data types and providing applications for its storage, retrieval, analysis and display.
One of the basic ways of integration is to spatially extract an area as displayed in a raster
data, using a polygon boundary of vector data. Another simple way is to extract value of
given raster cell is by overlaying a point vector layer. This interaction between the layers
uses the concept of spatial overlay that allows one to transfer data between objects of
different types and different layers, according to spatial relationship with each other. Spatial
data mining is an impression of general data mining that signifies the balances between
computational scalability and mathematical accuracy in managing the spatial data. It is the
procedure of determining possible useful patterns from large spatial datasets. A type of
statistics called geostatistics is engaged to examine and forecast the values connected to
spatial or spatiotemporal occurrences. The analyses include the spatial coordinates of the
data. Initially designed as a useful way to characterise spatial patterns and extrapolate
values for areas where samples were not gathered, many geostatistical tools were created.
Since then, these instruments and techniques have developed to offer not just interpolated
values but also measurements of uncertainty for those values. Thus, the spatial analysis in
GIS helps in processing and management of geospatial data that presents it as a significant
tool in widespread applications.
Unit 1 “Integration of RS and GIS” introduces you to key concepts of raster and vector data
integration, methods and techniques of raster and vector data integration, utilities of vector and
raster data integration and software and hardware tasks requirements necessary to perform
raster and vector data integration.
Unit 2 “Data Mining and Spatial Database Management” familiarises with basic concepts of
data mining. You will know spatial data mining with emphasis on spatial data mining techniques,
concepts of DBMS and SDBMS and data modelling and data organisation.
Unit 3 “Overview of Geostatistics and Spatial Data Measurements” discusses basics of
geostatistics and techniques and distance and length measurements that include spatial
distance measurements and spectral distance measurements. You will learn to know the
polygon perimeter, area measurement and recognise spatial relationships between variables
and mathematical operations.
Expected Learning Outcomes
After studying this block, you should be able to:
 know the basics of raster and vector data integration, methods of raster and vector data
integration;
 explain software and hardware requirements for raster and vector data integration;

 discuss the concepts of data mining and spatial data mining;

8
 describe the spatial data mining covering spatial data mining techniques, data modelling and
data organisation;
 identify essentials of geostatistics and its techniques and distance and length
measurements;
 illustrate the polygon perimeter and area measurement; and

 recognise spatial relationships between variables and mathematical operations.

We wish you all the best and hope you will enjoy reading this course.

9
10
UNIT 1

INTEGRATION OF RS AND GIS

Structure____________________________________________________
1.1 Introduction 1.4 Software and Hardware
Considerations
Expected Learning Outcomes
1.5 Summary
1.2 Raster and Vector Data integration
1.6 Activity
Data Integration
1.7 Terminal Questions
Stages of Integration
1.8 References
GIS Integration
1.9 Further/SuggestedReadings
1.3 Methods of Integration
1.10 Answers
Contributions of RS in Integration with GIS

Contributions of GIS in Integration with RS

Integration of RS and GIS for Analysis and


Modelling

1.1 INTRODUCTION
Data interpretation and analysis has become common in today’s world with the availability of larger
volumes of digital data in various formats. Data integration relates to various approaches that
combine or merge data obtainedfrom various sources to extractbetter and accurate information. It
includes data of different resolution, multi-temporal, multi-sensor, or multi-data type. Data
integration will offer a lot more applications through designing various models, running simulations
and offering wider scope for effective decision making. Though the technique of integrating raster
and vector had been there for over couple of decades now, recently there are large scale
developments in analytical methods- such as machine learning based algorithms, visualisation
techniques that had contributed in delivering solutions to complex problems. In this unit, we shall
discuss raster and vector data integration, methods of integration and software and hardware
considerations.
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

Expected Learning Outcomes_______________________


After studying this unit, you should be able to:
 know the key concepts of raster and vector data integration;
 describe methods and techniques of raster and vector data integration;
 explain the utilities of vector and raster data integration; and
 discuss software and hardware tasks requirements necessary to perform
raster and vector data integration.

1.2 RASTER AND VECTOR DATA INTEGRATION


Recently, GIS is found to be one of the basic sources for mapping and analysis
used for various applications. It is found to be a major tool in spatial data
integration. For mapping and analysis spatial data integration plays a major
role. Spatial data integration is “the process of combining multiple spatial data
types and providing applications for its storage, retrieval, analysis and display”.
Let us discuss about spatial data integration that includes vector and raster
data integration in detail.
1.2.1 Data Integration
The purpose of geographical enquiry is to examine the relationship among
different features at given point in space and time, also to describe or analyse a
phenomenon in real world. Data alone has no worth unless a user finds trends
and patterns in the data with respect to other events or objects. Therefore data
integration plays an important role in enhancing the usage of data.
One of the basic ways of integration is to spatially extract an area as displayed
in a raster data, using a polygon boundary of vector data. Another simple way is
to extract value of given raster cell (pixel) is by overlaying a point vector layer.
This interaction between the layers uses the concept of spatial overlay that
allows one to transfer data between objects of different types and different
layers, according to spatial relationship with each other. Analysis of a
phenomenon located in the geographic space defined by the user often
requires extracting information simultaneously from two or more layers. For
instance, if a user wants to determine the average surface temperature of
certain locations within a city, he/she can choose a raster grid representing the
city's surface temperature as the source layer and a vector layer representing
the specific location as the target layer. The user will be dealing with one
polygon and one raster data. Geospatial analysis therefore can be done using
spatial overlay operations by interaction of data between source layers to the
target layer. The spatial interaction between the datasets can take place.
1.2.2 Stages of Integration
There are three stages in RS and GIS integration i.e. raster and vector
integration (Fig. 1.1). Let us discuss these three stages one after another.
Stage I: In this stage the GIS and image processing are treated as separate
systems. However, they are connected by means of data exchange format that

12 Contributor: Dr. Nikhil Lele


Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......
permits to exchange data. In this first level of integration, geometric registration
of images to a common coordination system is possible.
Stage II: It is known as stage of seamless integration.Though GIS and image
processing system share the same user interface, but act individually and are
complementary to each other in seamless exchange of data.
Stage III: It is the process of total integration. In this stage, vector and raster
dichotomy ceases and GIS and image processing systems become a single
system. In this integration, object based and phenomena based representation
of geographic data can be controlled flexibly. As a part of total integration
remote sensing is also treated as input functionality in connection with the
handling of data.

Fig. 1.1: Three stages in remote sensing and GIS integration. (Source: modified
after Lo and Yeung 2009)

1.2.3 GIS Integration


Data integration in GIS is the method of combining spatial data procured for
different sources and formats to create an integrated dataset used for analysis
and decision making. Let us discuss GIS integration in detail.
GPS
Integration of GPS with GIS will facilitate to combine data and enhance the
capabilities that cannot be provided individually either by the GIS or the GPS.
When two of these technologies are combined it is possible to show the
field/actual site on the monitor of the computer (PC) that helps to derive better
conclusions instead of making particular site visits or referring various related

Contributor: Dr. Nikhil Lele 13


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
documents/drawings. Likewise, integration of GPS and GIS will also benefit
various stake holders of different departments to seamlessly share and analyse
data for their regular use. The other advancement is the supplementary viewing
of the actual photographs of the features in GIS derived from the digital picture
in addition to the information derived from the GIS.
Internet GIS
Commonly, the term WebGIS and Internet GIS are used as synonymous with
each other. There is a slight difference between these two words. The Internet
supports many services with the Web being one of these services. So we can
call a system as Internet GIS if it uses many of services of Internet not only
Web service and if it uses only Web we should name it WebGIS. This definition
makes Internet GIS boarder than WebGIS. In real world Web is the most
attractive service of Internet and it is why WebGIS is more common than
Internet GIS.
Internet Map Servers
Internet Map server (IMS) application allow custodians of GIS database to
easily make the spatial data accessible for end users through a web browser.
For a working IMS, one component of the system handles data processing
engine that runs on server side as a service and the other component is a
standard web server that manages incoming requests and responds back with
proper map data back to the client.
Wireless Technology
Integration of geospatial data with wireless technology provides wide variety of
services and with an intention to raise the business alternatives to serve end
users, industries keep on adding innovations. In recent trends, users prefer to
run GIS applications on smart phone or PDAs (Personal Digital Assistants).
Wireless communication, particularly now-a-days being accessed using mobile
phone has opened a vast area of information and topics for the users. The
information is now available at finger-tips in the real sense.
Wireless technology implemented on hand-held devices allows users to take
GIS out of the working (office) place and directly in the field. For majority of the
applications, it is an extension of Desktop GIS, but since users have access to
measure and directly upload or feed the data to the online tools, the technology
has been most popular in recent years. Furthermore, there is increasing
demand and scope in use of drones and small size UAVs which are also
governed using wireless technology has enabled task of data collection and
real-time assessment of events, in order to make quick decisions.
Web Services
GIS software has enabled users to view spatial data that exists locally or
typically over intranet connection within an organisation. In its early age, it has
been quite costly, needs huge investment and specialised skills. Increasing
demand and reducing cost of computers and peripherals has made use and
access of GIS data to reach common users. It has enabled easy interpretation
of spatial data. Unfortunately, not every user has access to GIS software, nor
be able to spend the time necessary to use it efficiently. Web services that are

14 Contributor: Dr. Nikhil Lele


Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......
offered via internet technology have become a cheap and easy way of
disseminating geospatial data and processing tools.
This integration of GIS with the Internet technology has revolutionary effects
like interactive access to geospatial data, real-time data integration and
transmission, and access to platform-independent GIS analysis tools.

SAQ I
a) What is data integration in GIS?
b) What is the Stage II of GIS integration?
c) Define Internet GIS.

1.3 METHODS OF INTEGRATION


In Unit 11 of the MGY 103 Course, you have been introduced to basic concepts
and methods of raster and vector data integration. In addition, we will discuss
the three methods by which remote sensing and GIS technologies can be
combined to support and improve the process of integration. Let us discuss
these methods in detail.
1.3.1 Contributions of RS in Integration with GIS
In the process of integration, RS is used as resource for collecting data in order
to use it in GIS. Let us discuss about these contributions in detail.
a) Thematic information extraction
The thematic data extracted from RS images is used in creation of thematic layers in
GIS. There are three methods to integrate the resulting thematic layers in GIS. They are

i) The aerial photographs or satellite imageries interpreted manually, results in


production of maps that portray boundaries between the thematic classes (e.g.,
forest or agriculture classes). The boundaries separating various thematic
classes are digitised in such as way that they are made compatible to be used in
GIS environment.

ii) Digital RS data that is classified by applying automated methods to produce


paper maps and images are digitised to be used as input in the GIS environment.

iii) Digital RS data that is classified using automated method is retained in its
digital form and used as input in GIS. On the other hand, digital RS data
can be directly entered in its raw form for further analyses.
b) Cartographic Information Extraction
The automatic procedures adopted in extraction of cartographic information
such as lines, polylines, polygons and other geographical entities is one of the
major achievements in the RS data input in GIS. The task of geographic feature
extraction is accomplished by applying pattern recognition, edge extraction, and
segmentation algorithm techniques. Hence, RS images will aid in better
production and improving the existing base maps. Further the extracted
cartographic information can be applied to enhance the process of image
classification.

Contributor: Dr. Nikhil Lele 15


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
In recent times the availability of high resolution satellite imageries like IKONOS
and QuickBird imageries had been proved to be potential source for feature
extraction by applying automated extraction techniques. These imageries had
been used in topographic mapping and 3D object reconstruction. In addition,
the light detection and ranging (LiDAR) data, is being applied in cartographic
information extraction as it is helpful in providing land surface elevation
information with less than 1 m (vertical and horizontal) accuracy.
c) Data Used in Updating GIS Database
It is quite essential to update GIS database from time to time. For which RS
data is found to be one of the important economical source of updating.
Additionally, RS data is used in change detection studies in GIS. In recent
times, RS and GIS integration are applied for querying raster pixels covering
vector polygons and also for carrying out analyses without adopting format
conversions and overlays. The derived image statistics of the associated vector
polygons can be applied to detect the possible changes that have occurred and
necessary updating of maps can be initiated.
d) RS Used as a Backdrop in Representation of GIS and Cartographic
Database
RS imageries are used in cartographic representation as input to GIS. The RS
images applied in terrain visualisation with digital elevation models (DEM) has
been proved to be a promising tool in environmental applications. Recent
development in cartographic animation technology had progressed in terrain
visualisation from static to dynamic state. Therefore, all these advances had
shown an extensive development in the implication of traditional 2D to 3D
method of representation of GIS as proved to be significant in various modelling
studies such as geology, soil sciences, climatology, and marine sciences.
1.3.2 Contributions of GIS in Integration with RS
In the process of integration, GIS data plays an important role in use of its data
that will enhance the functionality of RS at various stages of image processing
such as selection of area of interest for processing, pre-processing, and image
classification. Let us discuss these contributions in detail.
a) Use of GIS Database as a Source of Ancillary Data in RS Image
Classification
Ancillary data are the data which are collected separately from that of the
remotely sensed data. Since for a long time the ancillary data have been helpful
in manual interpretation of aerial photos. These supplementary data are used
during the process of identification and delineation of aerial photos. Similarly, in
digital RS ancillary data are supplemented during analysis in an organised
manner such that the data are in directly connected to RS data analysis.
Ancillary data are used to improve during the whole process of image
classification that includes pre-classification stratification, classifier modification,
and post-classification sorting. During pre-classification stage, ancillary data are
helpful to select the training samples or in dividing the study scene into smaller
areas or strata following certain designated standards or set of procedures.
During the process of post-classification sorting ancillary data are applied in

16 Contributor: Dr. Nikhil Lele


Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......
modification of misclassified pixels by following the expert designated
standards. Table 1.1 shows linking of ancillary data with RS imageries for
establishing improved image classification. For example, ancillary data derived
from topographic maps are considered to be valuable when applied to improve
the land cover classification accuracy as distribution of land cover is associated
to topography. Moreover, in addition to elevation the other constituents derived
from DEM such as slope and aspect are been used in image classification.
Topographic data will also aid in all the three stages of image classification - as
a pre-classification stratification tool, as an additional modification channel in
the course of classification, and in post-classification smoothing process.
Table 1.1: Major approaches to link ancillary data with remote sensing
images to establish enhanced classification accuracy. (Source:
modified after Weng 2010).

Method Features

Ancillary data usage Topography, land use, and soil maps

Road density

road coverage

Census data

Stratification Created on the basis of topography

Created on the basis of illumination and ecologic zone

Created on the basis of census data

On the basis of shape index of the patches

Post classification Karnel-based spatial reclassification


processing
Use of zoning and housing-density data to modify the initial
classification result

Use of contextual correction

Use of co-occurrence matrix-based filtering

Using polygon and rectangular mode filters

Using an expert system to perform post-classification sorting

Use of knowledge-based system to correct misclassification

Multisource data Spectral, texture, and ancillary data (like DEM, geology, soil,
usage existing GIS-based maps)

b) Ancillary Data Usage in Image Pre-processing


During image rectification of image pre-processing stage the use of ancillary
GIS data like vector points, area data, and DEMs are being noticed increasingly
for geometric and radiometric corrections. Now-a-days topographic data of high
resolution are being used intensively in radar image interpretation. The
influence of undulating topography on the radiometric characteristics in a digital
imagery can be rectified with the help of DEM. During the image pre-processing
stage the variables resulting from DEM are applied for topographic correction to
normalise the terrain influence of land-cover reflectance. Interestingly, the most

Contributor: Dr. Nikhil Lele 17


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
commonly used ancillary data (vector data sets) in image rectification are the
ground control points where more accurately recognisable points are carefully
picked from the related existing map with defined coordinates and used for
image registration.
c) Ancillary Data Usage for Selection of the Area of Interest
The usage of vector polygons to confine to the area of interest of an image
intended for processing can be carried with the help of presently available
image processing software (e.g., ERDAS IMAGINE software). Therefore, this
process permits masking operations without applying raster masks and
enabling faster image processing without any intermediate data storage
requirement. Thus, reducing the problems associated with data integration.
But there are certain unusual problems (practical and conceptual) with the
ancillary data used in image analysis. Preferably the ancillary data must be
compatible with the RS imagery in terms of scale, accuracy, geographic
reference system and acquired date. In few occasions ancillary data is
represented as discrete classes (nominal or ordinal data). But at the same time
RS data is represented as ratio or interval data creating a situation where the
issue of compatibility between the two data has to be addressed sensitively.
d) GIS as Tool to Organise Field/Reference Data for Use in RS
Applications
Besides aiding RS at various stages of image processing, GIS favours in
facilitating several digital operations (entering, analysing, managing, and
displaying) of data obtained from various sources essential for RS applications.
For most of the RS projects GIS database is required for storing, organising
and displaying ancillary, reference, and field data in addition to the aerial
photographs and satellite imageries. Likewise, GPS technologies are also
required in RS projects for collection and observation of in situ sample data.
1.3.3 Integration of RS and GIS for Analysis and
Modelling
The integration of RS and GIS technologies has a wider application and is
acknowledged as an effective tool in various analyses and modelling. The
combinations of several RS derived variables with GIS thematic layers are
known to be probable data sources for most of the analysis. The data
(multispectral, multiresolution, and multitemporal) collected from different
sources are converted to information which is to be used in monitoring several
land processes and for extracting various biophysical and socioeconomic
variables. GIS is responsible for providing favourable environment for carrying
out various functionalities (entering, analysing, and displaying) of digital data
derived from different sources and used for feature identification, change
detection, and database development of different applications.

1.4 SOFTWARE AND HARDWARE


CONSIDERATIONS
In the previous section you have read the methods of RS and GIS integration.
In this section let us discuss the software and hardware considerations.

18 Contributor: Dr. Nikhil Lele


Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......

GIS applications and GIS-based projects form core of the work for many
industries and small-scale companies that work in geo-spatial domain. Other
than this, a large number of service provider industries also acquire satellite
datasets, generate vector datasets, attribute datasets, use various automate
techniques for data processing and data extraction in order to achieve and
deliver desired outputs. These individuals/industries need infrastructure to
handle database efficiently.
Hardware and software considerations can vary significantly depending on the
task in hand. Recent trends show that data volumes are increasing day-by-day.
Hence, system requirements are also growing higher to store and process large
volumes of data. Yet, below are few minimum configurations that will allow
proper functioning of modern applications to work with sub-components.These
specifications indicate recommended requirements and packages that may run
on very small spatial extent and less below these specifications also.
Hardware Requirements
The following are the essential hardware requirements for a well-equipped GIS
Lab:
 Computer Workstations: High-performance desktop or laptop computers
with sufficient processing power, RAM, and storage are essential. For
intensive GIS tasks, a multi-core processor (e.g., Intel Core i7 or AMD
Ryzen series) with at least 16 GB of RAM is recommended. Additionally,
solid-state drives (SSD) provide faster data access and reduce loading
times.
 Graphics Processing Unit (GPU): A dedicated GPU with good processing
capabilities can significantly enhance GIS performance, especially when
working with large datasets and 3D visualizations. NVIDIA GeForce or AMD
Radeon graphics cards are popular choices for GIS applications.
 Display Monitors: High-resolution monitors (e.g., 24 inches or larger) with
accurate color reproduction are necessary for better visualization and data
analysis. Dual monitors can improve productivity by allowing users to view
multiple maps and applications simultaneously.
 Storage Solutions: Ample storage capacity is crucial to accommodate GIS
data and projects. A combination of fast SSDs for the operating system and
applications, along with larger capacity HDDs for data storage, is a
recommended setup.
 Peripherals: Standard peripherals like keyboard, mouse, and speakers are
required. Additionally, a digitizing tablet can be beneficial for precise
mapping and digitization tasks. Large format scanners play a vital role in
digitizing large, old paper maps, or hard copy satellite images, enabling their
conversion into digital formats for further processing.
 Network Infrastructure: A reliable and high-speed network connection is
essential to enable data sharing and collaboration within the GIS Lab. A
high-speed internet connection is crucial for leveraging cloud computing,

Contributor: Dr. Nikhil Lele 19


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
accessing GIS and RS data from diverse online sources, and effectively
working with web/internet GIS.
Software Requirements
Choice of software depends on the needs of functionalities desired by the user,
user’s ability for expenditure to be incurred, number of intranet and internet GIS
users, etc. Here are some of the most widely used GIS and image processing
software options to choose from.
1) ESRI GIS Software Packages: ESRI (Environmental Systems Research
Institute) offers several products for different categories of users. Few
leading and common products include ArcGIS, ArcSDE, ArcIMS and
ArcWeb services. ESRI ArcGIS is a powerful and widely-used geographic
information system (GIS) software. It offers comprehensive tools for spatial
analysis, mapping, data integration, management, and visualization, making
it a go-to solution for professionals in various fields, such as urban planning,
environmental science, and geospatial research. With its user-friendly
interface and extensive geospatial capabilities, ArcGIS empowers users to
explore, interpret, and present complex geographical data, facilitating better
decision-making and understanding of the world around us.
2) QGIS: It is one of the most powerful open source GIS software packages
that can be freely downloaded. It works on the basis of user developed
plug-ins which is the key to its success. User has to look for desired plug-ins
in the repository and simply download the same and install. Because of this
support and online help available among user groups, this software has
become one of the most popular open source GIS software.
3) Hexagon Geospatial Software Packages: Intergraph Geomedia and
ERDAS Imagine are two powerful geospatial software offerings by Hexagon
Geospatial, a leader in the geospatial industry. Geomedia is a versatile
Geographic Information System (GIS) platform that enables users to collect,
manage, analyze, and visualize geospatial data from various sources. It
offers advanced tools for spatial analysis, cartography, and data integration.
ERDAS Imagine, on the other hand, is a specialized and one of the most
widely used remote sensing and image processing software. It allows users
to process, analyze, and interpret satellite and aerial imagery, enabling
tasks such as image classification, change detection, and 3D visualization.
ERDAS Imagine is widely used in agriculture, forestry, and environmental
research, providing essential insights for land use mapping, crop
monitoring, and natural resource management. In addition, ERDAS Imagine
offers photogrammetry and Radar mapping suites. ERDAS Imagine also
offers support for LiDAR data processing.
4) Autodesk GIS Software Packages: Autodesk offers a range of powerful
GIS software packages that cater to various geospatial needs. With
products like Autodesk AutoCAD Map and Autodesk InfraWorks,
professionals can efficiently create, manage, and analyze geographic data,
enhancing their design and planning workflows. These software solutions
facilitate seamless integration with industry-standard CAD tools, providing a
comprehensive platform for geospatial modeling, visualization, and data
20 Contributor: Dr. Nikhil Lele
Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......
sharing. Whether it's urban planning, infrastructure design, or environmental
analysis, Autodesk's GIS software empowers users with the tools to make
informed decisions and drive efficiency in their geospatial projects.
5) ENVI: ENVI is widely used and powerful image processing software
designed specifically for remote sensing and geospatial analysis.
Developed byNV5 Geospatial Solutions, ENVI enables users to process,
analyze, and interpret various types of remotely sensed data, such as
optical, Radar and hyperspectral satellite images. With advanced algorithms
and tools, ENVI supports tasks like image classification, change detection,
and vegetation analysis. It also offers support for photogrammetric analysis
and module for SAR interferometry. Its user-friendly interface and extensive
capabilities allow researchers, scientists, and professionals to extract
valuable insights from complex geospatial data, ultimately supporting
informed decision-making and enhancing geospatial research and
applications.
6) Sentinel Toolbox: The Sentinel Toolbox consists of 3 separate
applications:
Sentinel-1 Toolbox (SAR applications)
Sentinel-2 Toolbox (High-resolution optical applications)
Sentinel-3 Toolbox (High-resolution optical applications)
One of the standout features of the Sen2cor plugin is its capability to correct
for atmospheric effects and classify images. Moreover, downloaded
Sentinel-1 synthetic aperture radar data can be efficiently processed with
the Sentinel-1 toolbox, as it enables the application of advanced techniques
such as interferometry If additional support is required, the open STEP
Forum a vibrant community of remote sensing enthusiasts will assist and
address any questions that you may have.
7) PCI Geomatica: PCI Geomatica is all-encompassing and advanced
geospatial software developed by PCI Geomatics, centered around image
processing. With a strong emphasis on geospatial analysis and image
manipulation, Geomatica provides a diverse array of tools for effectively
handling satellite images of various types. Its capabilities include image
rectification, ortho rectification, image classification, and change detection.
Geomatica seamlessly integrates conventional divisions in remote sensing,
photogrammetry, GIS, cartography, web, and development tools into one
cohesive environment, resulting in reduced errors, minimized time wastage,
and increased productivity.
The majority of the software packages mentioned above allow the utilization
of geo-located data from various features or events, finding widespread
applications across diverse fields. Geospatial data integration finds
application in numerous domains, spanning from mineral mapping and
precision farming to resource management, land use policy making,
transport management, environmental monitoring, emergency response,
and business development, among others. These applications leverage web
technology, enabling online decision-making, policy formulation, and
scenario building for users worldwide.
Contributor: Dr. Nikhil Lele 21
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

SAQ II
a) List the methods of GIS integration.
b) What is the important role played by GIS in integration with RS?
c) What is QGIS?

1.5 SUMMARY
Let us summarise what we have studied in this unit.
 GIS is found to be one of the basic sources for mapping and analysis used
for various applications. It is found to be a major tool in spatial data
integration. For mapping and analysis spatial data integration plays a major
role.
 Data integration plays an important role in enhancing the usage of data.
One of the basic ways of integration is to spatially extract an area as
displayed in a raster data, using a polygon boundary of vector data. And the
other simple way is to extract value of given raster cell is by overlaying a
point vector layer.
 There are three stages in RS and GIS integration. In stage I, the GIS and
image processing are treated as separate systems. But, they are connected
by means of data exchange format that permits to exchange data. The
Stage II is known as stage of seamless integration and Stage III is
considered as process of total integration.
 Integration of GPS with GIS will facilitate to combine data and enhance the
capabilities that cannot be provided individually either by the GIS or the
GPS. Normally, the term WebGIS and Internet GIS are used as
synonymous with each other where the Internet supports many services
with the Web being one of these services.
 Internet Map Server (IMS) application allows custodians of GIS database to
easily make the spatial data accessible for end users through a web
browser. Integration of geospatial data with wireless technology provides
wide variety of services and with an intention to raise the business
alternatives to serve end users, industries keep on adding innovations.
 Web services that are offered via internet technology have become a cheap
and easy way of disseminating geospatial data and processing tools.
Integration of GIS with the Internet technology has revolutionary effects like
interactive access to geospatial data, real-time data integration and
transmission, and access to platform-independent GIS analysis tools.
 There are three methods of RS and GIS technologies that can be combined
to support and improve the process of their integration. They are
contributions of RS in Integration with GIS, contributions of GIS in
integration with RS and integration of RS and GIS for analysis and
modelling
 GIS applications and GIS-based projects form core of the work for many
industries and small-scale companies that work in geo-spatial domain.

22 Contributor: Dr. Nikhil Lele


Unit 1 Integration of Remote Sensing and GIS
…………………………….…………………………………….………………………………………………......
Hardware and software considerations can vary significantly depending on
the system requirements.

1.6 ACTIVITY
 You have read in this Unit that QGIS works on the basis of user developed
plugins. List the popularly used plugins in QGIS.

1.7 TERMINAL QUESTIONS


1. What is data integration? Describe the stages in RS and GIS integration.
2. Discuss the contributions of RS in Integration with GIS.
3. Discuss the contributions of GIS in Integration with RS.
4. Describe the software and hardware considerations in RS and GIS
integration.

1.8 REFERENCES
 Davis, Jr. C. and LacerdaAlves, L. (2008) Web Services, Geospatial, In:
Shekhar, S., Xiong, H., (eds.), Encyclopedia of GIS. Springer, Boston, MA.
https://doi.org/10.1007/978-0-387-35973-1_1490
 Gao, J. (2002) Integration of GPS with Remote Sensing and GIS: Reality
and Prospect. Photogrammetric Engineering and Remote Sensing 68(5):
447-453.
 Karnatak, H.C., Shukla, R., Sharma, V.K., Murthy, Y.V.S., Bhanumurthy, V.
(2012) Spatial mashup technology and real time data integration in geo-web
application using open source GIS-a case study for disaster management.
GeocartoInt 27(6):499-514.
 Lo, C.P. and Yeung, A. K.W. (2009) Concepts and techniques of
geographic information system. PHI Learning Private Limited, New Delhi,
532p.
 Weng, Q. (2010). Remote Sensing and GIS Integration: Theories, Methods,
and Applications.New York: McGraw-Hill, 416p.
 https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/clip.htm
 https://enterprise.arcgis.com/en/get-started/10.9.1/windows/what-is-arcgis-
enterprise-.htm
 https://slideplayer.com/slide/5822638/
 https://www.geospatialworld.net/blogs/how-gnss-works/
 https://doi.org/10. 1080/10106049.2011.650651
 https://gis4africa.files.wordpress.com/2013/06/gisforafrica-soft-and-
hardware-requiremenst.pdf (Retrieved on 24/4/2023)

1.9 FURTHER/SUGGESTED READINGS


 Alesheikh, A.A. and Helali, H. (2001) Distributing National Geospatial
Information. Proceedings of Digital Earth 2001, Fredericton, NB, Canada.

Contributor: Dr. Nikhil Lele 23


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
 Weng, Q. (2010) Remote Sensing and GIS Integration: Theories, Methods,
and Applications.New York: McGraw-Hill, 416p.

1.10 ANSWERS
SAQ I
a) Data integration in GIS is the method of combining spatial data procured for
different sources and formats to create an integrated dataset used for
analysis and decision making.
b) Stage II of GIS integration is also known as stage of seamless integration.
In this stage GIS and image processing system share the same user
interface, but act individually and are complementary to each other in
seamless exchange of data.
c) If GIS uses many services of internet including web services is known as
Internet GIS.

SAQ II
a) The three methods of integration are i) Contributions of RS in integration
with GIS ii) Contributions of GIS in integration with RS iii) Integration of RS
and GIS for analysis and modelling.
b) In the process of integration, GIS data plays an important role in its use that
will enhance the functionality of RS at various stages of image processing
such as selection of area of interest for processing, pre-processing, and
image classification.
c) QGIS is open source GIS software works on the basis of user developed
plug-ins.

Terminal Questions
1. Please refer to subsections 1.2.1 and 1.2.2.
2. Please refer to subsection 1.3.1.
3. Please refer to subsection 1.3.2.
4. Please refer to section 1.4.

24 Contributor: Dr. Nikhil Lele


UNIT 2

DATA MINING AND SPATIAL


DATABASE MANAGEMENT

Structure____________________________________________________
2.1 Introduction Characteristics of Good DBMS

Expected Learning Outcomes 2.5 Database Modelling


2.2 Basic Concepts Types of Database Models

Spatial Database 2.6 Database Organisation


Data Mining 2.7 Summary
Database Management 2.8 Activity
2.3 Spatial Data Mining 2.9 Terminal Questions
Spatial Data Mining Techniques 2.10 References
2.4 Concepts of DBMS and SDBMS 2.11 Further/Suggested Readings
Database Structures 2.12 Answers
Database Management Functions

2.1 INTRODUCTION
Spatial data mining (SDM) is an overview of general data mining that signifies the balances
between computational scalability and mathematical accuracy in managing the spatial data. It is the
procedure of determining possible useful patterns from large spatial datasets. SDM is important as
it has vast potential in various applications such as climate change studies, epidemiology and public
safety, etc. For instance, SDM is useful in knowing the locations with high distribution of disease
outbreaks such as cholera for taking suitable measures in controlling the further spread of the
disease.
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
In the previous unit raster and vector data integration, methods of integration
and software and hardware considerations were discussed. In this unit, we will
discuss basic concepts of data mining, and spatial data mining. We will also
discuss concepts of DBMS and SDBMS, database modelling and database
organisation.

Expected Learning Outcomes_______________________


After studying this unit, you should be able to:
 discuss basic concepts of data mining;
 describe spatial data mining with emphasis on spatial data mining
techniques;
 know the concepts of DBMS and SDBMS; and
 explain data modelling and data organisation.

2.2 BASIC CONCEPTS


Let us discuss some of the basic concepts related to data mining and spatial
database management.
2.2.1 Spatial Database
A spatial database is a specialized type of database that efficiently manages
and stores geographical and spatial data, enabling spatial queries and analysis.
For instance, databases of cartographic nature are helpful in storing two-
dimensional spatial descriptions of objects of a map data related to continents,
regions, countries and states along with cities, towns, road network, rivers,
seas, etc. You are aware that the spatial databases are managed by using GIS
and applied in various fields such as environmental management,
transportation, forest applications, and emergency response system and so on.
Furthermore, there are additional databases connected to three-dimensional
spatial representations, encompassing meteorological data such as
temperature, rainfall, and other weather-related information. Vector and raster
data are considered as classical examples of spatial data. On the whole, spatial
databases stores spatial features of objects that share spatial relationships
among them. The association between the spatial objects are essential as they
are helpful in querying of a spatial database. The spatial databases are built in
such a manner to optimally store, query and retrieve spatial objects comprising
points, lines and polygons.
2.2.2 Data Mining
Data mining can be defined as the process of extraction or mining of
information from large volumes of data that is generally stored in data
warehouses. It is also known as knowledge discovery in databases (KDD). In
fact data warehouse is nothing but a central repository or store house of data
collated from several information systems of an organisation or an enterprise.
Though the word ‘data warehouse’ is found to be comparatively new it can be
searched in the previous scientific literature when computers were first used in
1950s for analysing large volumes of experimental data by applying statistical

26 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
and machine learning algorithm (also known as artificial intelligence)
approaches. The new approach of information extracted through data mining
method differs from other traditional methods in numerous ways.
The noteworthy character of data mining is that it can handle large data
warehouses that are generally comprised of millions of data records and
thousands of attributes. Interestingly, such type of data handling capabilities is
not possible with the conventional methods.
Data mining is endowed with functionality of secondary analysis of large data
sets to unleash the hidden knowledge that was unknown earlier which
embraces unforeseen patterns, relations and trends.
Data mining follows a definite approach for analysis of data where machine
learning algorithms are applied to achieve knowledge gradually from the
analysed data. This type of data analysis is performed by the users by not
assuming anything earlier or formulating a hypothesis as what knowledge will
be gained at the end of the analysis. Besides, data mining emphasises on the
probable characteristics and correlation between the attributes that are
concentrated in large number in a data set. Rather than focusing on the cause
and effect between the individual attributes indicating exploratory and
probabilistic approach followed in data mining.
The final approach followed in data mining is to illustrate the past models as
well as to comprehend the details of future probabilistic scenarios.The ability to
link past models and to predict future scenarios make it an exceptional decision
making tool in an establishment. The analytical approach of data mining makes
it an important decision support system in extracting information and
transforming it into a logical structure for further use.
Data Mining: Process and Techniques
Data mining follows a logical sequence in data processing as it is not a sole
independent process in extraction of stored information from a required data
set. Let us discuss about the steps followed in data mining process as given
below (Fig. 2.1).
1. Data Integration and Cleaning: It is the first step in the data mining
process where hetrogenous and multisourced data is collated and stored in
a single data store (data warehouse). During this process cleaning and
rectification of data is practised by removing incorrect, missing and erratic
data.
2. Data Selection and Transformation: This step includes the valid data
required for a particular task of data mining are retrieved from a database.
The retrieved data is then converted into essential form suitable for
application of data mining techniques.
3. Data Mining: It is the first step in knowledge discovery process where
hidden information is extracted and disclosed from the targeted data set by
applying machine learning, visualisation, and statistical techniques.
4. Knowledge Discovery and Construction: In this step the mined
information is processed by evaluation and interpretation with subsequent

Contributor: Dr. (Ms.) Shukla Acharjee 27


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
visualisation. Further it can be incorporated into a computer as knowledge
base, or by documenting and reporting to end users.
5. Disposition: The last step is to customise the knowledge base gained by
extraction that can be used for scientific research, policy design and
decision making.

Fig. 2.1: Steps followed in data mining process for extraction of hidden
information from a given data set. (Source: modified after Lo and Yeung
2009)

Nowadays machine learning techniques are being adapted for data mining.
These machine learning (i.e., the computer) techniques firstly apply learning
algorithm to know the characteristic of a training data set. Based on the training
data set, machine learning techniques are further used to create a model for
which new data sets are mapped so as to undertake classification, patterns,
predictions, and trends.

28 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
Machine learning is classified into supervised machine learning and
unsupervised machine learning (Fig. 2.2). It is normally classified based on
the level of human intervention necessary for the learning practices. Supervised
machine learning, is also known as predictive data mining, is used to focus on
problem solving. It is termed as ‘supervised’ because data analyst is required to
find a target field or dependent attributes in the data set that is being excavated.
A selected algorithm inspects the data trying to detect patterns and
relationships between the independent and dependent variables. The algorithm
detected patterns and relationships are applied to construct a model of
discovered knowledge and applied further to forecast the performances or
characteristics of new data objects or data sets.
In contrast, unsupervised machine learning, also known as descriptive data
mining, is a method of exploration-oriented data mining. It is used to briefly
detect the characteristics of the properties of a data set. Algorithms in this type
of data mining do not make any assumptions or hypotheses regarding the
target data set and try to find associations, clusters and trends in the data which
is independent of any pre-defined objective. Several techniques have been
recommended for data mining by machine learning.
But, based on the types of discovered knowledge, supervised techniques are
categorised in to two major groups by reducing the various methods such as
classification and prediction. Besides, five classes of unsupervised techniques
such as class/concept description, association, clustering, outlier analysis and
time-series analysis.

Fig. 2.2: Categorisation of machine learning data mining techniques.

2.2.3 Database Management


Database management is also known as data management comprises of
organising, storage, maintenance, manipulation, and retrieval of data during the
entire process of a data cycle. The data cycle is a cycle that represents the

Contributor: Dr. (Ms.) Shukla Acharjee 29


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
entire on-going process of a project from starting to till end of the project.
Database management entirely covers from storage and encryption to
executing access logs that help to track the accessed data and the
manipulations done to it.

2.3 SPATIAL DATA MINING


You have learnt some of the basic concepts related to data mining and spatial
database management in the previous section. Now let us discuss about spatial
data mining.
Spatial data mining (SDM) is derived from the conventional attribute-based data
mining (DM). However it varies from DM in numerous ways due to complex
nature of geospatial data. Let us discuss how SDM is different from DM.
 SDM handles spatial data with continuous two or three dimensional
geographic space. However, DM concentrates on attribute-oriented spatial
information in a discrete object space.
 SDM differs from conventional DM by not only handling attribute data but
also spatial data (points, lines, areas, and surfaces).
 SDM is commonly concerned in focusing on the detection of local
information whereas DM is inclined in finding the global knowledge.
 In geographical terms, SDM algorithms mainly depend on the theory of
neighbourhood assuming that features of spatial data objects are directly
affected by their immediate neighbours in comparison to the spatial objects
away from them.
 SDM predicates (such as overlay, intersect, besides, etc.) are implicit and
vast in number in comparison to conventional DM predicates (equal_to,
more_than, less_than, etc.) which are explicit and limited in number.
Conventional attribute-oriented DM algorithms cannot be applied to handle the
spatial data because these algorithms may not accomplish the successfully all
time. To work with the spatial data successfully and to fulfil the requirements of
SDM old concepts of conventional DM techniques have been replaced by new
techniques. In addition, working with SDM not only requires understanding of
spatial data and procedures in handling DM techniques but also knowledge of
geographic location and subject information that are characterised by the data.
Moreover it also involves clearly knowing the objectives of specific SDM tasks.
Improper understanding of these prerequisite tasks may result in serious
consequences due to erroneous spatial databases.
2.3.1 Spatial Data Mining Techniques
Spatial data mining techniques (SDM techniques) are grouped into seven
categories based on their knowledge discovery goals. They are
i) spatial classification
ii) spatial prediction
iii) spatial class/concept description
30 Contributor: Dr. (Ms.) Shukla Acharjee
Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
iv) spatial association
v) spatial clustering
vi) spatial outlier analysis
vii) spatial time-series analysis.
Let us discuss them in detail.
i) Spatial Classification: In this type of SDM technique, attribute values and
spatial associations of neighbouring data objects are combined to know the
possible standards for the organisation of the concept of a geospatial
dataset.
ii) Spatial Prediction: The regression methods used for estimation of attribute
oriented DM can also be cogently applied in estimation of SDM. Generally
applied extensions of classical regression model are i) Generalised linear
model (GLM), ii) Geographically weighted regression (GWR), and iii) trend
modelling.
iii) Spatial Class/Concept Description: The spatial class/concept description
is a generalised process. Though the process of generalisation results in
loss of information, however it is helpful in comprehending the spatial
knowledge in a simpler and easier way. It is designated into two types such
as a) spatial characteristic rule, and b) spatial discrimination rule.
iv) Spatial Association: There are many types of predications that can be
applied remarkably to represent the spatial associations in geospatial
datasets. They include a) topological relations (For instance, intersect,
overlap disjoint), b) spatial orientation (such as west_of, right_of), and c)
distance expression (for example, closer_to, far_away_from). Though many
numbers of associations exist between the objects in a typical geospatial
dataset only a few of them are of important to users.
v) Spatial Clustering: The main objective of performing spatial clustering is to
find maximum number of clusters in a particular geospatial dataset as well
as to know the distribution of clusters. In an overview, the clustering process
helps to provide comprehensive knowledge about spatial distribution
patterns of objects in a given spatial dataset. There are two methods of
spatial clustering that include spatial data dominant clustering and non-
spatial data dominant clustering.
vi) Spatial Outlier Analysis: In a given dataset if a particular data object has
different attributes from the remaining data objects it is known as outlier. In
addition, if the data object is georeferenced it is called as spatial outlier.
The spatial outliers are identified by using the anomalies in attributes with
the application of contemporary techniques in spatial outlier analysis.
Interestingly it can also be identified by its unique shape and size
accompanied by specific rate or mode of change is noticed in comparison
with its neighbours with the passing of time in a given dataset. Spatial

Contributor: Dr. (Ms.) Shukla Acharjee 31


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
outlier analysis can be performed using two methods such as graphical
outlier detection and quantitative outlier detection.
vii) Spatial Time-Series Analysis: It is the most complex and logical operation
in comparison to the all other SDM techniques. It is also known as
spatiotemporal analysis. The Event-based spatial-temporal data model
(ESTDM) is considered as one of the best time-based, conceptual-level
model for spatial data analysis. The visual language known as change
description language is also used for identifying temporal changes in
spatial objects.

SAQ I
a) What is spatial database?
b) Define data mining.
c) List the different classes of SDM techniques.

2.4 CONCEPTS OF DBMS AND SDBMS


You have read spatial data mining in the previous section. In this section we will
discuss concepts of DBMS and SDBMS.
Before knowing the database structure and database management functions let
us discuss what is DBMS and SDBMS.
i) DBMS
The system software used to create and administer databases is known as a
database management system (DBMS). Users can create, protect, read,
update, and remove data in a database with the help of a DBMS. The DBMS is
the most common kind of data management platform, effectively acts as an
interface between databases and users or application programmes, ensuring
that data is constantly organised and stays accessible (Fig. 2.3).
ii) SDBMS

SDBMS is the acronym of spatial database management system. It is known to


be a software module that can work with an underlying DBMS such as, an
object-relational database management system, or object-oriented database
management system. This software system supports spatial data models,
spatial abstract data types (ADTs), a query language from which these ADTs
are callable, spatial indexing, effective algorithms for processing spatial
operations, and domain-specific rules for query optimisation. It interacts with an
underlying DBMS ESRI SDE, MySQL spatial extension, Oracle Spatial, etc. It
can be simply described as a DBMS with the capacity to handle spatial data.

32 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......

Fig. 2.3: Database system architecture. (Source: modified after https://www.ques


10.com/p/9453/describe-overall-architecture-of-dbms-with-diagram/)

2.4.1 Database Structure


A systematic collection of data is called a database. A database offers a
structure to arrange the data rather than having it all in a list with a random
order. A database table is one of the most often used data structures. Rows
and columns make up a database table. Another name for a database table is a
two-dimensional array. Each value in an array is represented by a unique index,
similar to a list of items. Two indices are used in a two-dimensional array, and
they stand for the rows and columns of a table. Each row in a database is
referred to as a record. An object or an entity is other name for a record. Or to
put it another way, a database table is a grouping of records.
The database schema is the formal language-supported description of a
database's structure in a database management system (DBMS). The word
"schema" describes how data are arranged as a blueprint for the construction of
a database (split into database tables in the case of relational data). The
database schema is the formal language-supported description of a database's
structure in a database management system (DBMS). The word "schema"
refers to the way that data is organised as a construction plan for the database
Contributor: Dr. (Ms.) Shukla Acharjee 33
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
(divided into database tables in the case of relational databases). A series of
formulas (sentences) known as integrity constraints put on a database serve as
the formal definition of a database schema. The compatibility of the schema's
components is guaranteed by these integrity constraints. The same language
can express all limitations. In the database language's reality, a database can
be thought of as a structure. The states of a conceptual schema that has been
defined are converted into a database schema, which is an explicit mapping.
This explains how real-world objects are represented in the data. The
compatibility of the schema's components is guaranteed by these integrity
constraints. The same language can express all limitations.
"A database schema specifies, based on the database administrator's
knowledge of possible applications, the facts that can enter the database, or
those of interest to the possible end-users". In predicate calculus, the concept
of a database schema serves the same purpose as the idea of theory. The
model of this "theory" is most closely associated with a database, which is
consistently viewed as a mathematical object. Thus, using the same database
language, a schema can include formulas that reflect integrity requirements that
are specific to an application and constraints that are specific to a type of
database. The schema of a relational database specifies the elements such as
tables, fields, relationships, views, indexes, packages, procedures, functions,
queues, triggers, types, sequences, materialised views, synonyms, database
connections, directories, and XML schemas.
A data dictionary often contains the schema for a database. The term "schema"
is frequently used to refer to a graphical representation of the database
structure even though it is established in text database language. Schema is, in
other words, the database structure that specifies the items in the database.
2.4.2 Database Management Functions
There are number of functions performed by a typical DBMS. Let us discuss
about these functions in detail.
 Data definition: The DBMS offers functions to specify the data in the
application's structure. The record structure, the kind and size of fields, as
well as the numerous requirements that the data in each field must meet,
are some of these.
 Data manipulation: After the data structure has been established, data
needs to be added, changed, or removed. The DBMS also includes the
functions that carry out these processes. Both planned and unforeseen
needs for data processing can be met by these services.
 Data recovery and concurrency: The DBMS also manages the recovery
of data following a system failure and concurrent access to records by
numerous users.
 Data dictionary maintenance: One of the functions of a DBMS is to
maintain the data dictionary, which holds the application's data definition.
 Performance: One of the crucial functions is query performance
optimisation, which compares the various query implementations and

34 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
selects the best one. As a result, a significant amount of data and numerous
transactions must be processed.
2.4.3 Characteristics of a Good DBMS
A good DBMS provides the following advantages over a conventional system.
a) Independence of data and programme: A type of database administration
known as "data independence" keeps data apart from all programmes that
use it. Data independence, a pillar of the concept of a DBMS, assures that
the data cannot be redefined or reorganised by any of the applications that
use it. By doing this, the data is not only stable and incorruptible by the apps
accessing it, but it also remains accessible.The user application and the
database can both be changed independently of one another, saving time
and money that are essential to maintain consistency.
b) Data shareability and reduced redundancy of data: The ideal situation is
to allow apps to share one integrated database that contains all the data
they need. Hence minimising the requirement for redundant data storage.
Data redundancy refers to the fact that some data are kept in the database
more than once. One must modify numerous database columns in order to
make any adjustments or modifications to the redundant data.
c) Integrity: The absence of any changes to the data between two updates of
the data record is a sign of data integrity, which is the accuracy and
consistency of the data that has been stored. Standard rules and
procedures are used to impose data integrity within a database during the
design phase, and error checking and validation processes are used to
maintain it. Multiple users can access and modify the data in a database
while maintaining the data's integrity.
d) Centralised control: The database administrator can make sure that
standards are followed in the representation of data with the help of
centralised control over the database.
e) Security: The database administrator, who has management over the
database, can specify any user's access privileges to any data items or
defined subset of the database. Besides, guarantees that access to the
database, only occurs through authorised channels. The security system
assistances in stopping malicious or unintentional corruption of the current
data.
f) Performance and efficiency: Good performance and efficiency are vital
due to the size of the database and the rigorous database accessing needs.
The database administrator can organise the database system to deliver an
overall service that is best for the enterprise.

2.5 DATABASE MODELLING


You have learnt the concepts of DBMS and SDBMS in the previous section.
Now let us discuss about data modelling.
A database model displays the relationships and restrictions that control how
data can be stored and retrieved, as well as the logical structure of a database.

Contributor: Dr. (Ms.) Shukla Acharjee 35


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
Individual database models are created based on the principles and guidelines
of the larger data model that the designers choose to use. The majority of data
models have a corresponding database diagram.
2.5.1 Types of Database Models
Depending on a number of variables, you can pick one of these to describe a
database. The first important consideration is whether the database
management system you are using supports a specific model. Although some
accept different models, most database management systems were created
with a certain data model in mind and demand that users embrace it.
Additionally, many models are applicable at various phases of the database
design process. For mapping out relationships between data in ways that
people understand those data, high-level conceptual data models work well. On
the other hand, record-based logical models more accurately represent how the
data is kept on the server.
The purpose choosing a data model depends on speed, cost savings, usability,
or something else entirely. In addition, selecting a data model also involves
matching those requirements with the qualities of a particular model.
There are many kinds of data models. Some of the most common ones include:
 Hierarchical database model
 Relational model
 Network model
 Object-oriented database model
 Entity-relationship model
 Document model
 Entity-attribute-value model
 Star schema
 The object-relational model, which combines the two that make up its name
In the above listed data models some of them (hierarchical database model,
relational model, network model, object-oriented database model) were already
discussed in Unit 11 of MGY 103 Course. Please read (Unit 11 of MGY 103
Course) for knowing about these models and let us discuss the remaining data
models.
Object-relational model
This hybrid database model combines some of the advanced capability of the
object-oriented database model with some of the relational model's simplicity.
Essentially, it enables designers to include objects in the well-known table
structure. Languages and call interfaces that are extensions of the ones utilised
by the relational model include SQL3, vendor languages, ODBC, JDBC, and
proprietary call interfaces (Fig. 2.4).

36 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......

Fig. 2.4: Object-relational model. (Source: modified after http s://www.researchgate.


net/figure/A- simplified-Object-Relational-model_fig1_279500891).

Entity-relationship model
Similar to the network model, this model also records relationships between
real-world items, although it is less closely related to the physical structure of
the database. Instead, it is frequently utilised for conceptually constructing a
database (Fig. 2.5). Entities are used to refer to the people, places, and things
that data points are kept about. Each entity has a set of characteristics that
collectively make up their domain. Additionally, the cardinality, or connections
between entities, is mapped.The star schema, in which a central fact table
relates to numerous dimensional tables, is a popular variation of the ER
diagram.
There are a variety of other database models have been or are still used today.
Let us discuss them in detail.
Inverted file model
A database constructed using the inverted file format is intended to make quick
full-text searches possible. In this architecture, the location of the associated
files is indicated by the values of a lookup table's keys, which are used to index
the data content. For example, this structure can offer big data and analytics
reporting that is almost instantaneous.The Software AG ADABAS database
management system has been using this approach since 1970 and it is still
supported today.

Contributor: Dr. (Ms.) Shukla Acharjee 37


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

Fig. 2.5: Entity-relationship model. (Source: modified after https://jcsites.juniata.edu/


faculty/rhodes/dbms/ermodel.htm)

Flat model
The earliest and most basic data model is the flat model. It merely lists every
piece of information in a single table with columns and rows. This technique is
inefficient except for small data sets since the computer must read the full flat
file into memory in order to access or change the data.
Multidimensional model

This is a variation of the relational model designed to facilitate improved


analytical processing. While the relational model is optimised for online
transaction processing (OLTP), this model is designed for online analytical
processing (OLAP).
The cells of a dimensional database each contain information about the
dimensions they are tracking. Rather than resembling two-dimensional tables, it
resembles a group of cubes.
Semi-structured model
The structural data that is typically part of the database schema is integrated
into the data itself in this model. The line between data and schema in this case
is, at best, hazy. This approach is helpful for defining systems that we treat as
databases but cannot constrain with a schema, like some Web-based data
sources. It can be used to describe interactions between databases that follow
different schemas.
Context model

If necessary, this model can combine components from other database models.
It combines components from network models, object-oriented models, and
semi-structured models.

38 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
Associative model
This approach separates each data piece into its description of an entity or an
association. According to this paradigm, an association only exists in
connection to other things, whereas an entity only exists on its own.
The associative model structures the data into two sets:
 A set of items, each with a unique identifier, a name, and a type.
 A set of links, each with a unique identifier and the unique identifiers of a
source, verb, and target. The stored fact has to do with the source, and
each of the three identifiers may refer either to a link or an item.
Other, less common database models include:
 Semantic model, which includes information about how the stored data
relates to the real world.
 XML database, which allows data to be specified and even stored in XML
format.
 Named graph.
 Triple store.

2.6 DATABASE ORGANISATION


We have discussed data modelling in the previous section. In this section we
will discuss about database organisation.
Most GIS software organises spatial data in a thematic approach that
categorises data in vertical layers. The definition of layers is fully dependent on
the requirement of organisation. For example, typical layers used in natural
resource management agencies include forest cover, soil classification,
elevation, road network, ecological areas, hydrology, etc. others use other kind
of data layers as per their requirement.
Spatial data layers area commonly input one at a time, e.g., forest cover.
Accordingly, attribute data is entered one layer at a time. Depending on the
attribute data model used by the data storage subsystem, data must be
organised in a format that facilitates the manipulation and analysis tasks
required. Most often, the spatial and attribute data may be entered at different
times and linked together at a later stage. However, this is fully dependent on
the source of data.

SAQ II
a) Define DBMS.
b) List the database management functions performed by a typical DBMS.
c) What is semantic model?

2.7 SUMMARY
Let us summarise what you have studied in this unit.

Contributor: Dr. (Ms.) Shukla Acharjee 39


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
 Spatial database is the database that is responsible to provide support to
databases for finding the objects in a special domain. These spatial
databases are of two-dimensional and three-dimensional spatial
descriptions.
 Data mining can be considered as the process of extraction or mining of
information from large volumes of data that is generally stored in data
warehouses. Indeed data warehouse is nothing but a central repository or
store house of data collated from several information systems of an
organisation or an enterprise.
 Data mining follows a logical sequence in data processing as it is not a sole
independent process in extraction of stored information from a required data
set. The steps followed in data mining process are i) data integration and
cleaning, ii) data selection and transformation, iii) data mining, iv)
knowledge discovery and construction, and v) disposition.
 Spatial data mining is derived from the conventional attribute-based data
mining. However it varies from data mining in numerous ways due to
complex nature of geospatial data. It is commonly concerned in focusing on
the detection of local information whereas data mining is inclined in finding
the global knowledge.
 Spatial data mining techniques are grouped into seven categories based on
their knowledge discovery goals. They are i) spatial classification ii) spatial
prediction iii) spatial class/concept description iv) spatial association v)
spatial clustering vi) spatial outlier analysis vii) spatial time-series analysis.
 The system software used to create and administer databases is known as
a database management system (DBMS). Spatial database management
system (SDBMS) is known to be a software module that can work with an
underlying DBMS such as, an object-relational database management
system, or object-oriented database management system.
 A database model displays the relationships and restrictions that control
how data can be stored and retrieved, as well as the logical structure of a
database. Individual database models are created based on the principles
and guidelines of the larger data model that the designers choose to use.
 Most of the GIS software organises spatial data in a thematic approach that
groups data in vertical layers. The definition of layers is fully dependent on
the requirement of organisation.

2.8 ACTIVITY
Match the following

(i) Data mining a) local information


(ii) Spatial outlier b) vendor languages
(iii) Spatial data mining c) spatial abstract data types
(iv) SDBMS d) global knowledge discovery
(v) Object-relational model e) georeferenced data object

40 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......

2.9 TERMINAL QUESTIONS


1. Discuss the process and techniques of data mining.
2. Explain the spatial data mining.
3. Discuss the concepts of DBMS and SDBMS
4. Discuss various types of database models.

2.10 REFERENCES
 Bhatta, B. (2022) Remote Sensing and GIS. Oxford University Press, New
Delhi, 732p
 Bonczek, R.H., Holsapple, C.W. and Whinston, A.B. (1981) Foundations of
Decision Support Systems. Academic Press, New York. 393p
 Burrough, P. A., McDonnell, R.A. and Llyod, C. D. (1998) Principles of
Geographical Information Systems. Oxford University Press, New York,
352p
 Chaisman, N. (1992) Exploring Geographical Information Systems. John
Wiley and Sons Inc., New York, 198p.
 Chakraborty, D. and Sahoo, R.N. (2008) Fundamentals of Geographic
Information Systems. Viva Books Private Limited, India, 280p.
 Chang, K-t. (2002) Introduction to Geographic Information Systems. Tata
McGraw Hill, New Delhi.
 Chrisman, N.R. (2002) Exploring Geographic Information Systems. Wiley,
New York, 305p.
 Date, C.J., Kannan, A. and Swamynathan, S. (2009) An Introduction to
Database Systems (8th Ed.), Pearson Education.
 DeMers, M.N. (2008) Fundamentals of geographic information system,
Wiley, New York, 443p.
 Elmasri, R. and Navathe, S.B. (2011) Fundamentals of Database Systems
(6th Ed.), Addison-Wesley, Boston, 1200p.
 Goodchild, M.F. (1978) Statistical Aspects of the Polygon Overlay
Problems, In: Dulton, E. G.., (eds.), Harvard papers on GIS, Addison
Wesley, Reading Press.
 Harvey, F. (2008) A Primer of GIS: Fundamental Geographic and
Cartographic Concepts, The Guilford Press, New York.
 Huxhold, W.E. (1991) An Introduction to Urban Information Systems. New
York, OUP.
 Laurini, R. and Thompson, D. (1992) Fundamentals of Spatial Information
Systems. London, Academy Press.

Contributor: Dr. (Ms.) Shukla Acharjee 41


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
 Lo, C.P. and Yeung, A. K.W. (2009) Concepts and techniques of
geographic information system. PHI Learning Private Limited, New Delhi,
532p.
 Mac Donald, A. (1999) Building a Geodatabase, Redlands CA: ESRI Press.
 Magwire, D. J., Goodchild, M.F. and Rhind, D. M. (1991) Geographical
Information Systems: Principles and Applications. Longman, U.K.
 Malczewski, J. (1999) GIS and Multicriteria Decision Analysis. John Willey
and Sons, New York.
 Martin, D. (1991) Geographical Information Systems and their
Socioeconomic Applications. Routledge, London.
 Masser, I. and Blakemore, M (1991) Handling Geographical Information:
Methodology and Potential Applications, Ed.
 Peuquet, D.J. and Marble, D.F. (1990) Introductory Readings in Geographic
Information Systems. London, Taylor and Francis.
 Ramakrishnan, R. and Gehrke, J. (2002) Database Management Systems.
McGraw-Hill.
 Samet, H. (1990) The Design and Analysis of Spatial Data Structures.
Addison–Wesley.
 Silberschats, A. and Korth, H. F. (1998) Database System Concepts, 3rd
Edition, TMH.
 Sprague, R.H. (1980) A framework for the development of decision support
systems. Management Information Sciences Quarterly 4: Source for DSS
development model. 26p
 Sprague, R.H. and Carlson, E.D. (1982) Building Effective Decision Support
Systems. Prentice-Hall, Englewood Cliffs
 https://online.hbs.edu/blog/post/data-life-cycle
 https://www.brainkart.com/article/Spatial-Database-Concepts_11603/
 https://www-users.cse.umn.edu/~shekhar/research/sdb_MAIN.pdf

2.11 FURTHER/SUGGESTED READINGS


 Lo, C.P. and Yeung, A. K.W. (2009) Concepts and techniques of
geographic information system. PHI Learning Private Limited, New Delhi,
532p.

2.12 ANSWERS
SAQ I
a) Spatial database is the database that is responsible to provide support to
databases for finding the objects in a special domain.

42 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 2 Data Mining and Spatial Database Management
…………………………….…………………………………….………………………………………………......
b) Data mining can be defined as the process of extraction or mining of
information from large volumes of data that is generally stored in data
warehouses.
c) The different classes of SDM techniques are i) spatial classification ii)
spatial prediction iii) spatial class/concept description iv) spatial association
v) spatial clustering vi) spatial outlier analysis vii) spatial time-series
analysis.

SAQ II
a) DBMS is defined as the system software used to create and administer
databases.
b) There are various database management functions performed by a DBMS.
They are 1) data definition 2) data manipulation 3) data recovery and
concurrency 4) data dictionary maintenance 5) performance.
c) Sematic model is the less common database model that consists of
information about how the stored data relates to the real world.
Terminal Questions
1. Please refer to section 2.2.1.
2. Please refer to section 2.3.
3. Please refer to section 2.4.
4. Please refer to section 2.5.

Contributor: Dr. (Ms.) Shukla Acharjee 43


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

44 Contributor: Dr. (Ms.) Shukla Acharjee


UNIT 3

OVERVIEW OF GEOSTATISTICS AND


SPATIAL DATA MEASUREMENTS

Structure____________________________________________________
3.1 Introduction 3.5 Exploring Spatial Relationships
between Variables
Expected Learning Outcomes
3.6 Mathematical Operations
3.2 Overview of Geostatistics and
Techniques 3.7 Summary
Geostatistics Tools and Techniques 3.8 Activity
3.3 Distance and Length Measurements 3.9 Terminal Questions
Spatial Distance Measurements 3.10 References
Spectral Distance Measurements 3.11 Further/Suggested Readings
3.4 Perimeter and Area Measurement 3.12 Answers

3.1 INTRODUCTION
The spatial data measurements that are generally performed in a usual GIS project are
accomplished automatically and efficiently with the application of designed algorithms that form
parts of the GIS software. However it is important to know the nature of the computations that run in
the background while performing a particular operation in a GIS environment. This familiarity will
support in application of suitable spatial analysis functions.
In the previous unit basic concepts of data mining, and spatial data mining, concepts of DBMS and
SDBMS, database modelling and database organisation were discussed. In this unit, we will
discuss overview of geostatistics and techniques, distance and length measurements that include
spatial distance measurements and spectral distance measurements, and polygon perimeter and
area measurement. We will also discuss exploring spatial relationships between variables and
mathematical operations.
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

Expected learning Outcomes________________________


After studying this unit, you should be able to:
 describe basics of geostatistics and techniques;
 discuss distance and length measurements that include spatial distance
measurements and spectral distance measurements;
 know the and polygon perimeter and area measurement; and
 identify spatial relationships between variables and mathematical
operations.

3.2 OVERVIEW OF GEOSTATISTICS AND


TECHNIQUES
A category of statistics called geostatistics is employed to examine and forecast
the values connected to spatial or spatiotemporal occurrences. The analyses
include the spatial (and, in some cases, temporal) coordinates of the data.
Initially designed as a useful way to characterise spatial patterns and
extrapolate values for areas where samples were not gathered, many
geostatistical tools were created. Since then, these instruments and techniques
have developed to offer not just interpolated values but also measurements of
uncertainty for those values. For well-informed decision-making, uncertainty
assessment is essential because it offers details on the potential outcomes
(values) for each place rather than simply a single interpolated value. A
(potentially sparse) primary variable of interest can now be supplemented by
secondary datasets using mechanisms provided by geostatistical analysis,
which has progressed from uni- to multivariate analysis and allows for the
creation of more precise interpolation and uncertainty models.
The three main tools that geostatistics provides are:
a) Semi-variograms to model the relationship between all pairs of points;

b) Kriging modelling to predict values at unsampled locations; and

c) Standard error to measure confidence at unsampled values.

Let us discuss how the geostatistical tools can be used by taking an example.
For instance, you have collected soil samples from certain locations of a
particular area. The geostatistical tools can be helpful in answering the queries
such as.
 The probable soil moisture present in locations, where the sampling was not
undertaken.
 To what extent the spatial prediction of soil moisture is correct.
The applied geostatistical tool such as kirging method varies from deterministic
interpolation method like Inverse Distance Weighting (IDW) method where both
have the similarity in estimation of the unidentified locations (Fig.3.1).
Generally, in IDW model mathematical predetermined power function is used.
However, in kirging method mathematical and statistical function like semi-
variograms is applied, which we will discuss in detail in the preceding sub-

46 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
section. In both the methods result is same, but the IDW method does not
specify level of confidence.

Fig. 3.1: Inverse Distance Weighting (IDW) interpolation method. (Source:


www.gisgeography.com)

3.2.1 Geostatistics Tools and Techniques


Let us discuss the geostatistical tools and techniques in detail.
i) Semi-variograms
Semi-variograms are one of the descriptive tools offered by geostatistics for
identifying underlying trends in spatial phenomena. It portrays the spatial
autocorrelation of the measured sample points. This is based on the Tobler’s
First Law of Geography that states that closer things are more linked than
further ones. This law describes the fundamental notion behind the concept of
spatial autocorrelation is the same (Fig.3.2).
The semi-variogram plots all data pairings on a distance graph. A higher
correlation results from observations made near together. When each pair of
locations is plotted on the semi-variogram plot, a model is fit through
them. However, there is no longer a relationship between locations that are
close to one another after a given distance (range).
During interpretation of semi-variogram there are definite characteristic features
that are generally noticed. They are:
1. Sill: The value at which the model first flattens out (the value on the y-axis)
is known as sill.The incomplete sill is the sill value minus the nugget.
2. Range: At certain distance at which the model first flattens out is known as
range. Interestingly, sample locations separated by distances closer than
the range are found to be auto-correlated spatially. Conversely, locations
which are farther away than that of range are not spatially auto-correlated.
3. Nugget: The value at which the semi-variogram (almost) intercepts the y-
value is known as nugget. Hypothetically, at zero separation distance, the
semi-variogram value is considered as 0. But, at an insignificant smaller
separating distance, the semi-variogram generally exhibits a nugget
condition, which is considerably a smaller value greater than 0. For
instance, in a certain model if the semi-variogram intersects the y-axis at 1,
then the nugget is considered as 1. The nugget condition may occur
possibly due measurement errors, instrumental errors, etc.

Contributor: Dr. (Ms.) Shukla Acharjee 47


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

Fig. 3.2: Semi-variogram representing nugget, range and sill. (Source: modified
after www.https://gisgeography.com/kriging-interpolation-prediction/)

The semi-variogram shows the association until it reaches the sill, whereas
farther samples are no longer associated. The goal is to mathematically fit a
function that represents the semi-variogram's trend (Spherical, Circular,
Exponential, Gaussian, Linear).For example, you can select a semi-variogram
as shown in Fig. 3.3.

Fig.3.3: Semi-variogram mathematical functions. (Source: modified after


www.https://gisgeography.com /kriging-interpolation-prediction/).

ii) Kriging Interpolation

A method of interpolation called kriging uses the spatial correlation between


samples to forecast values at unsurveyed locations. The primary distinction is,
however, that it may be created using the mathematical function derived from
the semi-variogram. The spatial prediction model derived from kriging is
demonstrated in Figure 3.4.
Let us discuss the different types of kriging available in geostatistics.
48 Contributor: Dr. (Ms.) Shukla Acharjee
Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
 Co-kriging adds a second related variable by using additional data to
enhance the prediction. For instance, elevation information can be used as
a covariate to rainfall amounts to forecast changes in precipitation in
mountain regions.
 Empirical Bayesian Kriging (EBK) stands as a geostatistical interpolation
technique designed to streamline the intricate steps of constructing a sound
kriging model. Unlike other kriging methods within Geostatistical Analyst
that necessitate manual parameter adjustments for precise outcomes, EBK
takes a more automated approach by deducing these parameters via a
series of subsets and simulations.
 By taking trends into consideration, universal kriging augments
conventional kriging with trend surface analysis (or drift).
 Ordinary kriging using binary data (0 and 1), such as urban and non-urban
cells, is carried out through indicator kriging.
 Similar to indicator kriging, probability kriging employs binary data to
estimate unknown points for a sequence of cut-offs.

Fig. 3.4: Spatial model developed using kriging method. (Source:


https://gisgeography.com/kriging-interpolation-prediction/)

iii) Standard Error

Geostatistics is useful, because it evaluates uncertainty for unsampled


variables using a standard error surface map. A standard error map serves as
an indicator of the likelihood that a prediction will be accurate. The kriging
model's robustness is measured by standard error (Fig. 3.5). It evaluates
uncertainty by creating a residual surface by comparing actual and expected
values.
Contributor: Dr. (Ms.) Shukla Acharjee 49
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

When there are few observations, the standard of error is typically larger. The
variogram procedure benefits from expert knowledge when error rises above a
key level.

Fig. 3.5: Higher standard of error with sparse amount of observations. (Source:
https://gisgeography.com/kriging-interpolation-prediction/)

3.3 DISTANCE AND LENGTH MEASUREMENTS


In the previous section we have discussed the basics of geostatistics tools and
techniques. In this section we discuss the spatial distance measurements and
spectral distance measurements.

3.3.1 Spatial Distance Measurements


Distance is the measured essence of the concept of near and far. In simple
terms, distance is a standard that helps to measure the traversable space
between any two points that you have read in Unit 14 of MGY 103 Course. For
instance, if you consider travelling between two points as shown in Fig. 3.6, you
can traverse it in four different ways. It is clear by looking the figure that one
might traverse between two places by several numbers of ways. The various
ways of traversing the space creates diverse methods of distance
measurement.

Let us discuss some of the common linear distance measurements such as


Euclidean (Pythagorean Theorem) and Manhattan distance methods used in
GIS for distance measurement.

50 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......

Fig. 3.6: Higher standard of error with sparse amount of observation. (Source:
modified after https://gisgeography.com/kriging-interpolation-prediction/)

i) Euclidean Distance Measurement (Pythagorean Theorem)

In GIS, Euclidean distance measurement is the one of the commonly used


distance measurement method between any two given points and known as the
‘as the crow flies’ distance between them. It is applied to know the immediate
neighbours distance or the number of points covered under a particular buffer
distance from a point of interest. The Euclidean distance is the distance
measurement based on the Pythagorean Theorem. The Pythagorean Theorem
operates based on the relationship among the three sides of a right-angled
triangle. It can be defined as “the sum of the squares on the legs of a
right triangle (a triangle where one inside angle = 900) is equal to the square on
the hypotenuse (the side opposite the right angle)”. The relationships between
them are shown in Fig. 3.7.
Mathematically the equation is represented as: C2 = A2 + B2
Where A and B represents the lengths of two independent line segments other
than the hypotenuse (C). Hence, if the projected coordinates (X, Y coordinates)
are known for any of the two points, it is easy to determine the length between
them. This will help to determine the lengths of two legs and further calculation
of the length of the hypotenuse.

Contributor: Dr. (Ms.) Shukla Acharjee 51


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………

Fig. 3.7: Calculation of Euclidean distance between two points (Point #1 and
Point #2) in an X, Y Cartesian coordinate system by applying
Pythagorean Theorem. (Source: modified after Jensen and Jensen 2018)

ii) Manhattan Distance Measurement


Though linear distance measurement working on the principle Pythagorean
Theorem has vast applications, but it has some limitations. One of the
limitations is that from one point to other point in the urban clusters and
sometimes in natural areas it will not be always possible to measure the
Euclidean distance (as the crow flies route) between the points using a straight
line (hypotenuse). Alternatively Manhattan distance between two points can be
calculated by following the equation given below.
Manhattan distance = IX1 – X2I + IY1 – Y2I

Fig. 3.8: Calculation of Manhattan distance between two points (Point #1 and
Point #2) in an X, Y Cartesian coordinate system. (Source: modified after
Jensen and Jensen 2018)
52 Contributor: Dr. (Ms.) Shukla Acharjee
Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
The Manhattan distance (occasionally mentioned as “round-the-block” or “city
block” distance) between two points instead of using the hypotenuse of the right
triangle uses the lengths of the two sides (Fig. 3.6). For instance in an urban
area going from Point #1 to Point #2 is like walking through houses or climbing
the buildings. Instead, the better option is to walk around the block to get from
Point #1 to Point #2 (Fig. 3.8).
3.3.2 Spectral Distance Measurements
Spectral distance is a spectral measure generally used in unsupervised
classification and supervised minimum distance classification. The formula
used to determine the spectral separation between a pixel spectrum and a
reference spectrum is as given below.
d=
where d is the spectral distance, yi is the reflectance in band i for a pixel, ri is
the reflectance in band i for a reference and n is the number of bands in the
image.
The spectral distance has the potential to quantify the variation in crop growth
and yield. Let us discuss with an example. For grain sorghum, the best
phenological stage is around the peak vegetative development for yield
estimation (Yang and Everitt 2002). The spectral gap between healthy plants
and the reference, for instance, will be modest if a pure healthy crop canopy is
used as the reference, whereas the spectral distance between stressed plants
and the reference will be significant. As a result, spectral distance can be
utilised as a proximate indicator of the health and abundance of plants.

SAQ I
a) What are the three main tools of geostatistics used in interpolation and
uncertainty models?
b) What is Tobler’s First Law of Geography?
c) What is kriging interpolation?
d) Based on which theorem does Euclidean distance measurement work?

3.4 PERIMETER AND AREA MEASUREMENT


You have learnt distance and length measurements in the previous section. In
this section we discuss the perimeter and area measurement.
In many geospatial projects, the need to calculate perimeter and area arises
and their related concepts had been discussed in Unit 14 of MGY 103 Course. As
an example, imagine your curiosity lies in understanding the growth of a forest
within a specific region and comparing it with past years data. Additionally, you
might also want to determine the perimeter of this particular area. Likewise,
individuals might seek the perimeter and area calculations for population
census data at various levels such as village, block, state, and country. These
types of measurements are generally required in many of the geospatial studies
and also the methods by which they are calculated.
Let us discuss the how to measure perimeter and area.
Contributor: Dr. (Ms.) Shukla Acharjee 53
Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
a) Perimeter Measurement
The polygon perimeter is known by determining the length measurement of
each of the n line segments associated with a polygon and adding them. The
perimeter is calculated by using the equation given below:

The length associated with each line segment is calculated by applying the
Pythagorean Theorem as discussed in the previous section (Section 3.3).
b) Area Measurement
The polygon area is a measurement of the geographic area that is enclosed by
a polygon. It is very simple to calculate the polygon area when it has a regular
geometric shape like rectangle, circle, square or right angle triangle. For
calculation of these regular geometric areas certain equations are used as
given in Table 3.1. Interestingly these types of regular shapes are mostly found
in manmade environments. For example rectangular or square shaped
buildings, water harvesting structures, road networks, farm lands, etc. But,
these regular geometric shapes are uncommon in natural landscapes and
require complex calculations to measure the irregular shaped polygons.
Table 3.1: Equations used for calculating the area of regular geometric shapes.

Shape Area Formula

2
Square side

Rectangle length X width

2
Circle π X radius

Right Triangle base X height


2

To calculate the complex polygon’s area the Cartesian coordinates (X1, Y1),
(X2, Y2)..., (Xn, Yn) of all its vertices which are listed in order are to be known.
The equation for calculation is as given below:

Where Y is considered to be the Y-axis (or Northing) coordinates for each


vertex. Xi+1 are the X-axis (or Easting) coordinates for the next vertex, and Xi-1
is the x-axis (or Easting) coordinate for the preceding vertex. This equation is
considered for calculation by the use of sequential vertices which are in the
similar direction.
Once the overall basic perimeter and area properties of different polygons in
the given landscape are calculated, it will be helpful to apply these properties to
54 Contributor: Dr. (Ms.) Shukla Acharjee
Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
calculate of various landscape ecology metrics generally used in geospatial
applications and research.
3.5 EXPLORING SPATIAL RELATIONSHIPS
BETWEEN VARIABLES
You have read perimeter and area measurement in the previous section. Now
let us discuss about exploring spatial relationships between variables.
When two variables are connected, you can infer information about one by
looking at the values of the other variable. For investigating correlations,
making predictions about unknown variables, or comprehending crucial
elements, modelling relationships is helpful. A statistical technique known as
linear regression is used to estimate linear relationships between variables.
Such a connection may be strong, weak, or nonexistent. The strength of the link
between one or more exploratory variables (x) and the dependent variable is
determined via linear regression (y). There will be over- and under predictions
since models are far from accurate. These discrepancies are between observed
and projected values. Let us discuss about the relationships in detail.
Generalised Linear Regression (GLR) Tool
Generalized Linear Regression (GLR) is utilized for prediction and modelling
the relationship between a dependent variable and a set of explanatory
variables. It is required to define an input dataset, a dependent variable, a
model type, and an exploratory variable before using the tool(s). Let us discuss
about this model. The Ordinary Least Squares (OLS) tool has been renamed as
the Generalised Linear Regression tool. It incorporates three separate model
types: it provides a logistic model type for binary data, a Poisson model type for
count data, and the previous OLS model type (called Gaussian and suitable for
continuous data). In the event that a data distribution is not bell-shaped, these
two additional model types might be applicable. Then, continuous variables
must be transformed into a binary variable, like zeros and ones (indicating if
they are above or below the mean value). The presence or absence of
something, such as insurance fraud, fire damage, or pass/fail inspection, can
be predicted using binary data. A Poisson model is used to represent a count
variable, such as monthly sales, crime rates, or traffic accidents. These values
cannot contain decimals and must be positive integers.
Exploratory Regression Tool
To find the Ordinary Least Squares (OLS) models that best explain the
dependent variable in the context of user-specified criteria, the Exploratory
Regression tool analyses all potential combinations of input candidate
explanatory variables. This tool examines all variable combinations for
completeness, importance, bias, and performance, making it a useful place to
start when studying a dataset. The tool's message window that displays
passing models, is its primary output. The Spatial Autocorrelation Tool (Global
Moran's 1), a tool for measuring spatial autocorrelation based on the positions
of features and the values of their attributes, is also used by this tool.
Additionally, this tool is accessible apart from the exploratory regression tool.

Contributor: Dr. (Ms.) Shukla Acharjee 55


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
Geographically Weighted Regression Tool (GWR)
Similar to the earlier discussed Generalized Linear Regression, this method is
used for examining geographic variation (GLR). GWR, on the other hand, only
uses data from nearby features to examine local variations between features,
as opposed to the GLR tool, which builds a global model for all features in a
single study region. It concentrates on nearby features because it is predicated
on the assumption that objects close to one another have stronger relationships
than things far apart.
Local Bivariate Relationships Tool
The relationship between two variables on the same map is quantified by using
this tool. In order to achieve this, it is necessary to ascertain whether the values
of one variable depend on or are influenced by the values of another, and
whether those connections change with respect to location. The programme
looks for statistically significant correlations between two variables first. If there
is a relationship between the two, it is classified as either not significant,
positive linear, negative linear, concave, convex, or undefined complex using
one of the six relationship categories. The tool accepts input from layers of
points and polygons and is best utilised with continuous variables.

3.6 MATHEMATICAL OPERATIONS


In the previous section we have discussed exploring spatial relationships
between variables. In this section we will learn about mathematical operations.
Mathematical operations and functions that accomplish the task in map algebra
by using mathematical operators and functions. The mathematical operators
and functions are grouped into four types. They are arithmetic operators,
arithmetic functions, logarithmic functions, and power functions. Let us discuss
about these operators and functions in detail.
 Arithmetic operators
Arithmetic operators permit the addition (plus), subtraction (minus),
multiplication (times) and division (divide) of rasters and numbers. For instance,
the result obtained by adding two rasters is a map algebra expression as shown
in Fig. 3.9. These operators may also allow to covert one type of measurement
to other type (e.g., feet x 0.3048 = meters).

Fig. 3.9: Arithmetic operator-Map Algebra expression showing OutRas = InRas1 +


InRas2. (Source: modified after%20operations%20and%20functions)

56 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
 Arithmetic functions
The arithmetic functions are five in number. Abs function measures an input
raster's values in absolute terms. Round Up and Round Down is two rounding
operations that change decimal point values into whole numbers. Values are
converted between integer and floating-point formats using Int and Float. By
multiplying the input values by -1, the Negate (in Map Algebra, Unary-) function
alters the sign of the values. For example, the result of using Abs function on a
raster with negative values is shown in Fig. 3.10.

Fig. 3.10: Arithmetic function-Map Algebra expression showing OutRas = Abs


(InRas1). (Source: modified after https://webhelp.esri.com/arcgisdesktop/9.3/
index.cfm?TopicName=mathematical%20operations%20and%20functions)

 Logarithmic functions
The logarithmic functions work with input rasters or values to execute
exponential and logarithmic calculations. The natural (Ln), base 2 (Log2), and
base 10 (Log10) logarithmic functions as well as the base e (Exp), base 2
(Exp2), and base 10 (Exp10) exponential functions are accessible. For
example, the result of taking the log of the values in a raster is shown in Fig.
3.11.

Fig. 3.11: Logarithmic function-Map Algebra expression showing OutRas =


Ln(InRas1). (Source: modified after https://webhelp.esri.com/arcgisdesktop/ 9.
3/index.cfm?TopicName=mathematical%20operations%20and%20functions)

 Power functions
The general math tools incorporate three distinct power functions. These
functions enable the manipulation of numbers within the input raster in various
ways: by calculating the square root (Square Root), squaring (Square), or

Contributor: Dr. (Ms.) Shukla Acharjee 57


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
exponentiation values (Power) to a specified degree. For example, each cell
value in the input raster below is raised to the power of 2 as shown in Fig 3.9.
OutRas = Pow (InRas, 2)

Fig. 3.12: Power function-Map Algebra expression showing OutRas = Pow(InRas,


2). (Source: modified after https://webhelp.esri.com/arcgisdesktop/9.3/
index.cfm?TopicName=mathematical%20operations%20and%20functions)

SAQ II
a) How is polygon perimeter measured?
b) In what way is Poisson model useful?
c) What are the functions of the arithmetic operators?
d) What are the power functions available in General Math tools?

3.7 SUMMARY
Let us summarise what we have studied in this unit.
 Geostatistics which is considered as a part of statistics is employed to
examine and forecast the values connected to spatial or spatiotemporal
occurrences. The analyses include the spatial (and, in some cases,
temporal) coordinates of the data.
 The three main tools that geostatistics provides are i) Semi-variograms to
model the relationship between all pairs of points; ii) Kriging modeling to
predict values at unsampled locations; and iii) Standard error to measure
confidence at unsampled values.
 Commonly used linear distance measurement tools in GIS are the
Euclidean (Pythagorean Theorem) and Manhattan distance methods. The
Euclidean distance method is applied to know the distance between
immediate neighbours or the number of points covered under a particular
buffer distance from a point of interest.
 The Manhattan distance, sometimes referred to as the 'round-the-block' or
'city block' distance between two points, replaces the hypotenuse of the
right triangle with the sum of the lengths of the two sides. The spectral
distance, on the other hand, serves as a spectral metric commonly

58 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
employed in unsupervised classification and supervised minimum distance
classification.
 The polygon perimeter is known by determining the length measurement of
each of the n line segments associated with a polygon and adding them.
The polygon area is a measurement of the geographic area that is enclosed
by a polygon. It is very simple to calculate the polygon area when it has a
regular geometric shape.
 When two variables are connected, you can infer information about one by
looking at the values of the other variable. For investigating correlations,
making predictions about unknown variables, or comprehending crucial
elements, modelling relationships is helpful.
 Map algebra achieves its objectives through the utilization of mathematical
operations and functions. The mathematical operators and functions are
grouped into four types, viz., arithmetic operators, arithmetic functions,
logarithmic functions, and power functions.
3.8 ACTIVITY
 In the below given diagram mark the Euclidean and Manhattan distances
between the points A and B and write their applications.

3.9 TERMINAL QUESTIONS


1. Discuss the tools and techniques used in geostatistics.
2. Discuss the Euclidean and Manhattan distance methods used in spatial
distance measurement.
3. Describe the perimeter and area measurement of spatial data.
4. Discuss the mathematical operations in GIS.
3.10 REFERENCES
 Bhatta, B. (2022) Remote Sensing and GIS. Oxford University Press, New
Delhi, 732p

Contributor: Dr. (Ms.) Shukla Acharjee 59


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
 Bonczek, R.H., Holsapple, C.W. and Whinston, A.B. (1981) Foundations of
Decision Support Systems. Academic Press, New York. 393p
 Burrough, P. A., McDonnell, R.A. and Llyod, C. D. (1998) Principles of
Geographical Information Systems. Oxford University Press, New York,
352p
 Chaisman, N. (1992) Exploring Geographical Information Systems. John
Wiley and Sons Inc., New York, 198p.
 Chakraborty, D. and Sahoo, R.N. (2008) Fundamentals of Geographic
Information Systems. Viva Books Private Limited, India, 280p.
 Chang, K-t. (2002) Introduction to Geographic Information Systems. Tata
McGraw Hill, New Delhi.
 Chrisman, N.R. (2002) Exploring Geographic Information Systems. Wiley,
New York, 305p.
 Date, C.J., Kannan, A. and Swamynathan, S. (2009) An Introduction to
Database Systems (8th Ed.), Pearson Education.
 DeMers, M.N. (2008) Fundamentals of geographic information system,
Wiley, New York, 443p.
 Elmasri, R. and Navathe, S.B. (2011) Fundamentals of Database Systems
(6th Ed.), Addison-Wesley, Boston, 1200p.
 Goodchild, M.F. (1978) Statistical Aspects of the Polygon Overlay
Problems, In: Dulton, E. G.., (eds.), Harvard papers on GIS, Addison
Wesley, Reading Press.
 Harvey, F. (2008) A Primer of GIS: Fundamental Geographic and
Cartographic Concepts, The Guilford Press, New York.
 Huxhold, W.E. (1991) An Introduction to Urban Information Systems. New
York, OUP.
 Jensen, J.R. and Jensen R.R. (2018) Introductory Geographic Information
Systems. Pearson Education, Inc. 385p.
 Laurini, R. and Thompson, D. (1992) Fundamentals of Spatial Information
Systems. London, Academy Press.
 Lo, C.P. and Yeung, A. K.W. (2009) Concepts and techniques of
geographic information system. PHI Learning Private Limited, New Delhi,
532p.
 Mac Donald, A. (1999) Building a Geodatabase, Redlands CA: ESRI Press.
 Magwire, D. J., Goodchild, M.F. and Rhind, D. M. (1991) Geographical
Information Systems: Principles and Applications. Longman, U.K.
 Malczewski, J. (1999) GIS and Multicriteria Decision Analysis. John Willey
and Sons, New York.
 Martin, D. (1991) Geographical Information Systems and their
Socioeconomic Applications. Routledge, London.

60 Contributor: Dr. (Ms.) Shukla Acharjee


Unit 3 Overview of Geostatistics and Spatial Data Measurement
…………………………….…………………………………….………………………………………………......
 Masser, I. and Blakemore, M (1991) Handling Geographical Information:
Methodology and Potential Applications, Ed.
 Peuquet, D.J. and Marble, D.F. (1990) Introductory Readings in Geographic
Information Systems. London, Taylor and Francis.
 Ramakrishnan, R. and Gehrke, J. (2002) Database Management Systems.
McGraw-Hill.
 Samet, H. (1990) The Design and Analysis of Spatial Data Structures.
Addison–Wesley.
 Silberschats, A. and Korth, H. F. (1998) Database System Concepts, 3rd
Edition, TMH.
 Sprague, R.H. (1980) A framework for the development of decision support
systems. Management Information Sciences Quarterly 4: Source for DSS
development model. 26p
 Sprague, R.H. and Carlson, E.D. (1982) Building Effective Decision Support
Systems. Prentice-Hall, Englewood Cliffs
 Yang, C. and Everitt, J. H. (2002) Relationships between yield monitor data
and airborne multidate multispectral digital imagery for grain sorghum.
Precision Agriculture, 3(4), 373-388p
 https://pro.arcgis.com/en/pro-app/latest/help/analysis/geostatistical-
analyst/understanding-a-semivariogram-the-range-sill-and-nugget.htm
 https://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=mathem
atical%20operations%20and%20functions
 http://www.geography.hunter.cuny.edu/~jochen/gtech361/lectures/lecture11
/concepts/Map%20algebra%20operators.htm

3.11 FURTHER/SUGGESTED READINGS


 Chang, K-t (2002) Introduction to Geographic Information Systems. Tata
McGraw Hill, New Delhi.
 Jensen, J. R. and Jensen R.R. (2018) Introductory Geographic Information
Systems. Pearson Education, Inc. 385p.

3.12 ANSWERS
SAQ I
a) The three main tools of geostatistics used in interpolation and uncertainty
models are 1) semi-variograms, 2) kriging technique, and 3) standard error.
b) Tobler’s First Law of Geography states that closer things are more linked
than further ones.
c) Kriging interpolation is a method of interpolation that uses the spatial
correlation between samples to forecast values at unsurveyed locations.
d) Euclidean distance measurement works on the basis of the Pythagorean
Theorem.

Contributor: Dr. (Ms.) Shukla Acharjee 61


Block 1 Basics of Spatial Analysis
…………………………………………………………………….…………………………………………………
SAQ II
a) The polygon perimeter is measured by determining the length measurement
of each of the n line segments associated with a polygon and adding them.
b) A Poisson model is used to represent a count variable, such as monthly
sales, crime rates, or traffic accidents. These values cannot contain
decimals and must be positive integers.
c) Arithmetic operators permit the addition (plus), subtraction (minus),
multiplication (times) and division (divide) of rasters and numbers.
d) The three power functions available in General Math tools are square root
(Square Root) or the square (Square) of the values on the input raster can
be calculated, or the values can be raised to a power (Power).
Terminal Questions
1. Please refer to subsection 3.2.1.
2. Please refer to subsection 3.3.1.
3. Please refer to section 3.4.
4. Please refer to section 3.6.

62 Contributor: Dr. (Ms.) Shukla Acharjee


GLOSSARY
Data integration in GIS : It is the method of combining spatial data procured
for different sources and formats to create an
integrated dataset used for analysis and decision
making.
Database management : It encompasses organizing, storage, maintenance,
manipulation, and retrieval of data during the entire
process of a data cycle.
Data mining : It can be defined as the process of extraction or
mining of information from large volumes of data that
is generally stored in data warehouses.
Database schema : It is the formal language-supported description of a
database's structure in a database management
system.
Data warehouse : It is a central repository or store house of data
collated from several information systems of an
organisation or an enterprise.
DBMS (Database : It is the system software used to create and
Management System) administer databases.
Euclidean distance : It is the distance measurement based on the
Pythagorean Theorem.
Kriging Interpolation : A method of interpolation where spatial correlation
between the samples is performed to forecast
unsurveyed locations.
Outlier : A particular data object that has different attributes
from the remaining data objects.
Pythagorean Theorem : It is the sum of the squares on the legs of a right
triangle (a triangle where one inside angle = 900) is
equal to the square on the hypotenuse (the side
opposite the right angle).
SDBMS : A DBMS with the capacity to handle spatial data.
Spatial database : It is a specialized type of database that efficiently
manages and stores geographical and spatial data,
enabling spatial queries and analysis.
Spatial data : It is the process of combining multiple spatial data
integration types and providing applications for its storage,
retrieval, analysis and display.
Semi-variograms : are one of the descriptive tools offered by
geostatistics for identifying underlying trends in
spatial phenomena.
Tobler’s First Law of : The law that states that closer things are more
Geography: linked than further ones.

63
64

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy