GIS Notes
GIS Notes
com
OCE552 – GEOGRAPHIC INFORMATION SYSTEM UNIT- I
Introduction to GIS - Basic spatial concepts - Coordinate Systems - GIS and Information
Systems – Definitions – History of GIS - Components of a GIS – Hardware, Software, Data,
People, Methods – Proprietary and open source Software - Types of data – Spatial, Attribute
data- types of attributes – scales/ levels of measurements.
INTRODUCTION TO GIS:
A geographic information system (GIS) is a computer system for capturing, storing,
querying, analyzing and displaying geospatial data. One of many applications of GIS is
disaster management.
On March 11, 2011, a magnitude 9.0 earthquake struck off the east coast of Japan,
registering as the most powerful earthquake to hit Japan on record. The earthquake triggered
powerful tsunami waves that reportedly reached heights of up to 40 meters and traveled up to
10 kilometers inland. In the aftermath of the earthquake and tsunami, GIS played an
important role in helping responders and emergency managers to conduct rescue operations,
map severely damaged areas and infrastructure, prioritize medical needs, and locate
temporary shelters. GIS was also linked with social media such as Twitter, YouTube, and
Flickr so that people could follow events in near real time and view map overlays of streets,
satellite imagery, and topography.
Figure: An example of geospatial data. The street network is based on a plane coordinate system. The box on
the right lists the x and y coordinates of the end points and other attributes of a street segment.
The ability of a GIS to handle and process geospatial data distinguishes GIS from
other information systems and allows GIS to be used for integration of geospatial data and
other data.
HISTORY OF GIS:
The first operational GIS is reported to have been developed by Roger Tomlinson in
the early 1960s for storing, manipulating, and analyzing data collected for the Canada Land
Inventory (Tomlinson 1984). In 1964, Howard Fisher founded the Harvard Laboratory for
Computer Graphics, where several well known computer programs of the past such as
SYMAP, SYMVU, GRID, and ODESSEY were developed and distributed throughout 1970s.
These earlier programs were run on mainframes and minicomputers, and maps were made on
line printers and pen plotters. In the 1980s, commercial and free GIS packages appeared in
the market.
As GIS continually evolves, two trends have emerged in recent years. One, as the core
of geospatial technology, GIS has increasingly been integrated with other geospatial data
such as satellite images and GPS data. Two, GIS has been linked with Web services, mobile
technology, social media and cloud computing.
COORDINATE SYSTEMS:
A basic principle in geographic information system (GIS) is that map layers to be
used together must align spatially. Obvious mistakes can occur if they do not. For example,
below figure shows the interstate highway maps of Idaho and Montana downloaded
separately from the Internet. The two maps do not register spatially. To connect the highway
networks across the shared state border, we must convert them to a common spatial reference
system. The coordinate system provides spatial reference.
GIS users typically work with map features on a plane (flat surface). These map
features represent spatial features on the Earth’s surface. The locations of map features are
based on a plane coordinate system expressed in x and y coordinates, whereas the locations of
spatial features on the Earth’s surface are based on a geographic coordinate system expressed
in longitude and latitude values. A map projection bridges the two types of coordinate
systems. The process of projection transforms the Earth’s surface to a plane, and the outcome
The map shows the interstate highways in Idaho and The map shows the connected interstate
Montana based on different coordinate systems. networks based on the same coordinate system.
can measure the longitude value of a point on the Earth’s surface as 0° to 180° east or west of
the prime meridian. Meridians are therefore used for measuring location in the E–W
The flattening is based on the difference between the semimajor axis a and the semiminor axis b.
The angular measures of longitude and latitude may be expressed in degrees-minutes-
seconds (DMS), decimal degrees (DD), or radians (rad). Given that 1 degree equals 60
minutes and 1 minute equals 60 seconds, we can convert between DMS and DD. For
example, a latitude value of 45°52'30" would be equal to 45.875° (45 + 52/60 + 30/3600).
Radians are typically used in computer programs. One radian equals 57.2958°, and one
degree equals 0.01745 rad.
Map Projections:
A map projection transforms the geographic coordinates on an ellipsoid into locations
on a plane. The outcome of this transformation process is a systematic arrangement of
parallels and meridians on a flat surface representing the geographic coordinate system. A
map projection provides a couple of distinctive advantages. First, a map projection allows us
to use two-dimensional maps, either paper or digital. Second, a map projection allows us to
work with plane coordinates rather than longitude and latitude values.
Map projections can be grouped by either the preserved property or the projection
surface. Cartographers group map projections by the preserved property into the following
four classes: conformal, equal area or equivalent, equidistant, and azimuthal or true direction.
A conformal projection preserves local angles and shapes. An equivalent projection
represents areas in correct relative size. An equidistant projection maintains consistency of
scale along certain lines. And an azimuthal projection retains certain accurate directions. The
preserved property of a map projection is often included in its name, such as the Lambert
conformal conic projection or the Albers equal-area conic projection.
(a) (b)
The Projected Coordinate System (a): Representation of points in Geographic Coordinate System
(b): Equivalent representation in Projected coordinate system
Three coordinate systems are commonly used in the United States: the Universal
Transverse Mercator (UTM) grid system, the Universal Polar Stereographic (UPS) grid
system, and the State Plane Coordinate (SPC) system.
COMPONENTS OF GIS:
A GIS is an organized collection of computer hardware, software, geographic data,
and personnel designed to efficiently capture, store, update, manipulate, analyze, and display
all forms of geographically referenced information. GIS technology integrates common
database operations, such as query and statistical analysis, with the unique visualization and
geographic analysis benefits offered by maps. A working GIS integrates the following key
components: hardware, software, data, people, and methods.
o Hardware - GIS hardware includes computers for data processing, data storage,
and input/output; printers and plotters for reports and hard-copy maps; digitizers
and scanners for digitization of spatial data; and GPS (Global Positioning System)
and mobile devices for fieldwork.
o Software - GIS software, either commercial or open source, includes programs
and applications to be executed by a computer for data management, data analysis,
data display, and other tasks. Additional applications, written in Python,
JavaScript, VB.NET, or C++, may be used in GIS for specific data analyses.
o Method - A successful GIS operates according to a well-designed plan and
business rules, which are the models and operating practices unique to each
organization. Any organization has documented their process plan for GIS
operation. These document address number question about the GIS methods:
number of GIS expert required, GIS software and hardware, Process to store the
data, what type of DBMS (database management system) and more. Well
designed plan will address all these questions.
o People - GIS technology is of limited value without the people who manage the
system and to develop plans for applying it. GIS users range from technical
specialists who design and maintain the system, to those who use it to help them
do their everyday work.
o Data - Maybe the most important component of a GIS is the data. Geographic
data and related tabular data can be collected in-house or bought from a
commercial data provider. Most GIS employ a DBMS to create and maintain a
database to help organize and manage data. The data that a GIS operates on
consists of any data bearing a definable relationship to space, including any data
about things and events that occur in nature. At one time this consisted of hard-
copy data, like traditional cartographic maps, surveyor’s logs, demographic
statistics, geographic reports, and descriptions from the field. Advances in spatial
data collection, classification, and accuracy have allowed more and more standard
digital base-maps to become available at different scales.
o Organization - GIS operations exist within an organizational environment;
therefore, they must be integrated into the culture and decision-making processes
of the organization for such matters as the role and value of GIS, GIS training,
data collection and dissemination, and data standards.
WORKING OF GIS:
GIS consists of the following elements i.e. geospatial data, data acquisition, data
management, data display, data exploration, and data analysis.
Geospatial Data: By definition, geospatial data cover the location of spatial features.
To locate spatial features on the Earth’s surface, we can use either a geographic or a
projected coordinated systems are available for use in GIS. A GIS represents
geospatial data as either vector data or raster data.
The vector data model uses points, lines, and polygons to represent spatial features
with a clear spatial location and boundary such as streams, land parcels, and
vegetation stands. Each feature is assigned an ID so that it can be associated with its
attributes.
The raster data model uses a grid and grid cells to represent spatial features: point
features are represented by single cells, line features by sequences of neighbouring
cells, and polygon features by collections of contiguous cells. The cell value
corresponds to the attribute of the spatial feature at the cell location. Raster data are
ideal for continuous features such as elevation and precipitation.
The raster data model uses cells in a grid to represent point features
Data Acquisition: Data acquisition is usually the first step in conducting a GIS
project. The need for geospatial data by GIS users has been linked to the development
of data clearinghouses and geoportals. Since the early 1990s, government agencies at
different levels in the United States as well as many other countries have set up
websites for sharing public data and for directing users to various data sources.
Data acquisition involves compilation of existing and new data. To be used in a GIS,
a newly digitized map or a map created from satellite images requires geometric
transformation (i.e., geo-referencing). Additionally, both existing and new spatial data
must be edited if they contain digitizing and/or topological errors.
Data Display: A routine GIS operation is mapmaking because maps are an interface
to GIS. Mapmaking can be informal or formal in GIS. It is informal when we view
geospatial data on maps, and formal when we produce maps for professional
presentations and reports. A professional map combines the title, map body, legend,
scale bar, and other elements together to convey geographic information to the map
reader.
To make a “good” map, we must have a basic understanding of map symbols, colors,
and typology, and their relationship to the mapped data. Additionally, we must be
familiar with map design principles such as layout and visual hierarchy. After a map
is composed in a GIS, it can be printed or saved as a graphic file for presentation. It
can also be converted to a KML file, imported into Google Earth, and shared publicly
on a web server.
Feature-based query can involve either attribute or spatial data. Attribute data query is
basically the same as database query using a DBMS. In contrast, spatial data query
allows GIS users to select features based on their spatial relationships such as
containment, intersect, and proximity. A combination of attribute and spatial data
queries provides a powerful tool for data exploration.
Data Analysis: A GIS has a large number of tools for data analysis. Some are basic
tools, meaning that they are regularly used by GIS users. Other tools tend to be
discipline or application specific. Two basic tools for vector data are buffering and
overlay: buffering creates buffer zones from select features, and overlay combines the
geometries and attributes of the input layers.
extensions include 3D Analyst, Network Analyst, Spatial Analyst, Geostatistical Analyst, and
others.
GRASS GIS (Geographic Resources Analysis Support System), the first FOSS for
GIS, was originally developed by the U.S. Army Construction Engineering Research
Laboratories in the 1980s. Well known for its analysis tools, GRASS GIS is currently
maintained and developed by a worldwide network of users. Academicians, government
agencies (NASA, NOAA, USDA and USGS) and GIS practitioners use this open source
software because its code can be inspected and tailored to their needs.
SAGA GIS (System for Automated Geoscientific Analyses) is one of the classics in
the world of free GIS software. It started out primarily for terrain analysis such as
hillshading, watershed extraction and visibility analysis. Now, SAGA GIS is a powerhouse
because it delivers a fast growing set of geoscientific methods to the geoscientific
community.
GeoDa is a free GIS software program primarily used to introduce new users into
spatial data analysis. Its main functionality is data exploration in statistics. One of the nicest
things about it is how it comes with sample data for you to give a test-drive. From simple
box-plots all the way to regression statistics, GeoDa has complete arsenal of statistics to do
nearly anything spatially.
APPLICATION OF GIS:
GIS is a useful tool because a high percentage of information we routinely encounter
has a spatial component. An often cited figure among GIS users is that 80 percent of data is
geographic. Since its beginning, GIS has been important for land use planning, natural hazard
assessment, wildlife habitat analysis, riparian zone monitoring, timber management, and
urban planning. The list of fields that have benefited from the use of GIS has expanded
significantly for the past three decades.
In the United States, the U.S. Geological Survey (USGS) is a leading agency in the
development and promotion of GIS. The USGS website provides case studies as well as
geospatial data for applications in climate and land use change, ecosystem analysis, geologic
mapping, petroleum resource assessment, watershed management, coastal zone management,
natural hazards (volcano, flood, and landslide), aquifer depletion, and ground water
management.
In the private sector, most GIS applications are integrated with the Internet, GPS,
wireless technology, and Web services. The following shows some of these applications:
Online mapping websites offer locators for finding real estate listings, vacation
rentals, banks, restaurants, coffee shops, and hotels.
Location-based services allow mobile phone users to search for nearby banks,
restaurants, and taxis; and to track friends, dates, children, and the elderly.
Mobile GIS allows field workers to collect and access geospatial data in the field.
Mobile resource management tools track and manage the location of field crews and
mobile assets in real time.
Automotive navigation systems provide turn by-turn guidance and optimal routes
based on precise road mapping using GPS and camera.
Augmented reality lets a smart phone user look through the phone’s camera with
superimposed data or images (e.g., 3-D terrain from a GIS, monsters in Pokemon Go)
about the current location.
SCALES/LEVELS OF MEASUREMENTS:
Scales of Measurement or level of measurement is a system for classifying attribute
data into four categories namely nominal, ordinal, interval and ratio.
Nominal: In this level of measurement, the numbers in the variable are used only to
classify the data. In this level of measurement, words, letters, and alpha-numeric
symbols can be used. Suppose there are data about people belonging to three
different gender categories. In this case, the person belonging to the female gender
could be classified as F, the person belonging to the male gender could be classified
as M, and transgendered classified as T. This type of assigning classification is
nominal level of measurement.
Ordinal: This level of measurement depicts some ordered relationship among the
variable’s observations. Suppose a student scores the highest grade of 100 in the
class. In this case, he would be assigned the first rank. Then, another classmate
scores the second highest grade of an 92; she would be assigned the second rank. A
third student scores a 81 and he would be assigned the third rank, and so on. The
ordinal level of measurement indicates an ordering of the measurements.
Levels of Measurements
Interval: The interval level of measurement not only classifies and orders the
measurements, but it also specifies that the distances between each interval on the
scale are equivalent along the scale from low interval to high interval. For
example, an interval level of measurement could be the measurement of anxiety in a
student between the score of 10 and 11, this interval is the same as that of a student
who scores between 40 and 41. A popular example of this level of measurement
is temperature in centigrade, where, for example, the distance between 940C and
960C is the same as the distance between 1000C and 1020C.
Ratio: In this level of measurement, the observations, in addition to having equal
intervals, can have a value of zero as well. A common geographic example of ratio
data is density (i.e. population, ethnicity, etc.). Any percent value from 0 to 100 will
have a meaningful zero.
*****
DATABASE MODEL:
Data model defines the logical structure of a database. Data Models are fundamental
entities to introduce abstraction in a DBMS. Data models define how data is connected to
each other and how they are processed and stored inside the system. There are a number of
different database data models. Amongst those that have been used for attribute data in GIS
are the hierarchical, network, relational, object-relational and object-oriented data models. Of
these the relational data model has become the most widely used model.
The data in a relational database are stored as a set of base tables with the
characteristics described above. Other tables are created as the database is queried and these
represent virtual views. The table structure is extremely flexible and allows a wide variety of
queries on the data. Queries are possible on one table at a time (for example, you might ask
‘which hotels have more than 14 rooms?’ or ‘which hotels are luxury standard?’), or on more
than one table by linking through key fields (for instance, ‘which passengers originating from
the UK are staying in luxury hotels?’ or ‘which ski lessons have pupils who are over 50 years
of age?’). Queries generate further tables, but these new tables are not usually stored. There
are few restrictions on the types of query possible.
ER Diagram:
An Entity–relationship model (ER model) describes the structure of a database with
the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER
model is a design or blueprint of a database that can later be implemented as a database. ER
Model is best used for the conceptual design of a database. The main components of E-R
model are:
Entity − An entity in an ER Model is a real-world entity having properties
called attributes. Every attribute is defined by its set of values called domain. For
example, in a school database, a student is considered as an entity. Student has
various attributes like name, age, class, etc.
Relationship − The logical association among entities is called relationship.
Relationships are mapped with entities in various ways. Mapping cardinalities define
the number of association between two entities. The following are the Mapping
cardinalities - one to one, one to many, many to one & many to many.
The figure shows the ER diagram for the GPS tracking system. The design has three
entities namely User-generated Data, Publisher and Subscriber.
Topology:
Topology refers to the study of those properties of geometric objects that remain
invariant under certain transformations such as bending or stretching. An example of a
topological map is a subway map.
A subway map depicts correctly the connectivity between the subway lines and
stations on each line but has distortions in distance and direction. In GIS, vector data can be
topological or non-topological, depending on whether topology is built into the data or not.
Topology can be explained through directed graphs (digraphs), which show the arrangements
of geometric objects and the relationships among objects. An edge or arc is a directed line
with a starting point and an ending point. The end points of an arc are nodes, and
intermediate points, if any, are vertices. And a face refers to a polygon bounded by arcs. If an
arc joins two nodes, the nodes are said to be adjacent and incident with the arc.
TIGER:
An early application example of topology is the Topologically Integrated Geographic
Encoding and Referencing (TIGER) data base from the U.S. Census Bureau. The TIGER
database links statistical area boundaries such as counties, census tracts, and block groups to
roads, railroads, rivers, and other features by topology.
Topology has three main advantages. First, it ensures data quality and integrity.
Second, topology can enhance GIS analysis. Third, topological relationships between spatial
features allow GIS users to perform spatial data query.
In a simple raster data structure, such as illustrated in the above figure, different
spatial features must be stored as separate data layers. Thus, to store more raster entities,
separate data files would be required, each representing a different layer of spatial data.
However, if the entities do not occupy the same geographic location (or cells in the raster
model), then it is possible to store them all in a single layer, with an entity code given to each
cell. This code informs the user which entity is present in which cell.
Above figure shows how different land uses can be coded in a single raster layer. The values
1, 2 and 3 have been used to classify the raster cells according to the land use present at a
given location. The value 1 represents residential area; 2, forest; and 3, farmland.
One of the major problems with raster data sets is their size, because a value must be
recorded and stored for each cell in an image. Thus, a complex image made up of a mosaic of
different features (such as a soil map with 20 distinct classes) requires the same amount of
storage space as a similar raster map showing the location of a single forest. To address this
problem a range of data compaction methods have been developed.
The above figure shows such a vector data structure for the Happy Valley car park.
Note how a closed ring of co-ordinate pairs defines the boundary of the polygon. The
limitations of simple vector data structures start to emerge when more complex spatial
entities are considered. For example, consider the Happy Valley car park divided into
different parking zones (Figure: b). The car park consists of a number of adjacent polygons.
If the simple data structure, illustrated in Figure: a, were used to capture this entity then the
boundary line shared between adjacent polygons would be stored twice. This may not appear
too much of a problem in the case of this example, but consider the implications for a map of
the 50 states in the USA.
The amount of duplicate data would be considerable. This method can be improved
by adjacent polygons sharing common co-ordinate pairs (points). To do this all points in the
data structure must be numbered sequentially and contain an explicit reference which records
which points are associated with which polygon. This is known as a point dictionary. The
data structure in Figure: b, shows how such an approach has been used to store data for the
different zones in the Happy Valley car park.
There is a considerable range of topological data structures in use by GIS. All the
structures available try to ensure that:
no node or line segment is duplicated;
line segments and nodes can be referenced to more than one polygon;
all polygons have unique identifiers; and
island and hole polygons can be adequately represented.
compression. Other methods include LZW (Lempel—Ziv-Welch) and its variations (e.g.,
LZ77,LZMA).
A lossy compression cannot reconstruct fully the original image but can achieve
higher compression ratios than a lossless compression. Lossy compression is therefore useful
for raster data that are used as background images rather than for analysis. Image degradation
through lossy compression can affect GIS-related tasks such as extracting ground control
points from aerial photographs or satellite images for the purpose of georeferencing.
Run length encoding:
Run length encoding stores cells on a row-by-row basis. Instead of recording each
individual cell’s values, run length encoding groups cell values by row.
Block coding:
The block coding raster storage technique assigns areas that are blocks to reduce
redundancy. The block coding raster image compression method subdivides an entire raster
image into hierarchical blocks. It’s an extension of the run length encoding technique, but
extends it to two dimensions.
Chain Coding:
Chain coding defines the outer boundary using relative positions from a start point.
The sequence of the exterior is stored where the endpoint finishes at the start point. During
the encoding, the direction is stored as an integer. However, in this example we use cardinal
directions for simplicity. For example, the value 0 is north and 1 is east.
Quadtree encoding:
Quadtrees are raster data structures based on the successive reduction of
homogeneous cells. It recursively subdivides a raster image into quarters. The subdivision
process continues until each cell is classed.
MrSID uses the wavelet transform for data compression. The wavelet-based
compression is also used by JPEG 2000 and ECW (Enhanced Compressed Wavelet). The
wavelet transform treats an image as a wave and progressively decomposes the wave into
simpler wavelets (Addison 2002). Using a wavelet (mathematical) function, the transform
repetitively averages groups of adjacent pixels (e.g., 2, 4, 6, 8, or more) and, at the same time,
records the differences between the original pixel values and the average. The differences,
also called wavelet coefficients, can be 0, greater than 0, or less than 0. In parts of an image
that have few significant variations, most pixels will have coefficients of 0 or very close to 0.
To save data storage, these parts of the image can be stored at lower resolutions by rounding
off low coefficients to 0, but storage at higher resolutions is required for parts of the same
image that have significant variations (i.e., more details). Box 4.4 shows a simple example of
using the Haar function for the wavelet 3transform.
VECTOR vs RASTER:
Vector Raster
Usually Complex. Usually Simple.
Difficult for overlay operation. Efficient for overlay operation.
High spatial variability is inefficiently High spatial variability is efficiently
represented. represented.
Small file size. Large file size.
Vector data model is often used for Raster data model is widely used for
representing discrete features with representing continuous spatial features.
definable boundaries.
Example: Example:
observations. Thus, an appropriate number of observations must be selected, along with their
geographical location.
The ‘resolution’ of a DTM is determined by the frequency of observations used.
DTMs are created from a series of either regularly or irregularly spaced (x,y,z) data points
(where x and y are the horizontal co-ordinates and z is the vertical or height co-ordinate).
DTMs may be derived from a number of data sources. These include contour and spot height
information found on topographic maps, stereoscopic aerial photography, satellite images and
field surveys.
Triangulated Irregular Networks:
A commonly used data structure in GIS software is the triangulated irregular network
(TIN). It is on the standard implementation techniques for digital terrain models, but it can be
used to represent any continuous field. The principles behind a TIN are simple. It is built
from a set of locations for which we have a measurement for instance an elevation. The
locations can be arbitrarily scattered in space and are usually not on a nice regular grid. Any
location together with its elevation value can be viewed as a point in three dimensional space.
This is illustrated in below figure. From these 3D points, we can construct an irregular
tessellation made of triangles.
our elevation computation for location `P` on the left hand shaded triangle, we will get
another value than from the right hand shaded triangle. The second will provide a better
approximation because the average distance from `P` to the three triangle anchors is smaller.
The triangulation shown in below figure happens to be a Delaunay triangulation, which in a
sense is an optimal triangulation. There are multiple ways of defining what such a
triangulation is, but we suffice here to state two important properties. The first is that the
triangles are as equilateral (‘equal-sided’) as they can be, given the set of anchor points. The
second property is that for each triangle, the circumcircle through its three anchor points does
not contain any other anchor point. One such circumcircle is depicted on the right of Figure
(b).
Two triangulations based on the input locations (a) one with many ‘stretched’ triangles;
(b) the triangles are more equilateral – Delaunay triangulation.
A TIN clearly is a vector representation: each anchor point has a stored georeference.
Yet, we might also call it an irregular tessellation, as the chosen triangulation provides a
partitioning of the entire study space. However, in this case, the cells do not have an
associated stored value as is typical of tessellations, but rather a simple interpolation function
that uses the elevation values of its three anchor points.
vendors, is working to deliver spatial interface specifications that are available for global use
(OGC, 2001). The OGC has proposed the Geography Markup Language (GML) as a new
GIS data standard.
The Geography Markup Language (GML) is a non-proprietary computer language
designed specifically for the transfer of spatial data over the Internet. GML is based on XML
(eXtensible Markup Language), the standard language of the Internet, and allows the
exchange of spatial information and the construction of distributed spatial relationships.
GML has been proposed by the Open Geospatial Consortium as a universal spatial
data standard. GML is likely to become very widely used because it is:
Internet friendly;
not tied to any proprietary GIS;
specifically designed for feature-based spatial data;
open to use by anyone;
compatible with industry-wide IT standards.
It is also likely to set the standard for the delivery of spatial information content to
PDA and WAP devices, and so form an important component of mobile and location-based
(LBS) GIS technologies. The collection of geoportals and various other compliemntary
services, create a Spatial Data Infrastructure (SDI).
Data Accuracy:
In GIS, data quality is used to give an indication of how good data are. It describes
the overall fitness or suitability of data for a specific purpose or is used to indicate data free
from errors and other problems. Examining issues such as error, accuracy, precision and bias
can help to assess the quality of individual data sets. In addition, the resolution and
generalization of source data, and the data model used, may influence the portrayal of
features of interest. Data sets used for analysis need to be complete, compatible and
consistent, and applicable for the analysis being performed.
Accuracy is the extent to which an estimated data value approaches its true value
(Aronoff, 1989). If a GIS database is accurate, it is a true representation of reality. It is
impossible for a GIS database to be 100 per cent accurate, though it is possible to have data
that are accurate to within specified tolerances. For example, a ski lift station co-ordinate may
be accurate to within plus or minus 10 metres.
Several types of error can arise when accuracy and/or precision requirements are not
met during data capture and creation. The five types of error in a geospatial dataset are
related to -
Positional Accuracy:
The identification of positional accuracy is important. This includes consideration of
inherent error (source error) and operational error (introduced error). A more detailed review
is provided in the next section.
Attribute Accuracy:
Consideration of the accuracy of attributes also helps to define the quality of the data.
This quality component concerns the identification of the reliability, or level of purity
(homogeneity), in a data set.
Logical Consistency:
This component is concerned with determining the faithfulness of the data structure
for a data set. This typically involves spatial data inconsistencies such as incorrect line
intersections, duplicate lines or boundaries, or gaps in lines. These are referred to as spatial
or topological errors.
Completeness:
The final quality component involves a statement about the completeness of the data
set. This includes consideration of holes in the data, unclassified areas, and any compilation
procedures that may have caused data to be eliminated.
*****
Scanner - Raster Data Input – Raster Data File Formats – Vector Data Input –Digitiser –
Topology - Adjacency, connectivity and containment – Topological Consistency rules –
Attribute Data linking – ODBC – GPS - Concept GPS based mapping.
Introduction:
Data encoding is the process of getting data into the computer. It is a process that is
fundamental to almost every GIS project. For example:
An archaeologist may encode aerial photographs of ancient remains to integrate with
newly collected field data.
A planner may digitize outlines of new buildings and plot these on existing
topographical data.
An ecologist may add new remotely sensed data to a GIS to examine changes in
habitats.
A historian may scan historical maps to create a virtual city from the past.
A utility company may encode changes in pipeline data to record changes and
upgrades to their pipe network.
Once in a GIS, data almost always need to be corrected and manipulated to ensure
that they can be structured according to the required data model. Problems that may have to
be addressed at this stage of a GIS project include:
the re-projection of data from different map sources to a common projection;
the generalization of complex data to provide a simpler data set; or
the matching and joining of adjacent map sheets once the data are in digital form.
This unit looks in detail at the range of methods available to get data into a GIS.
These include keyboard entry, digitizing, scanning and electronic data transfer. Then,
methods of data editing and manipulation are reviewed, including re-projection,
transformation and edge matching. The whole process of data encoding and editing is often
called the ‘data stream’.
Analogue data are normally in paper form, and include paper maps, tables of statistics
and hard-copy (printed) aerial photographs. These data all need to be converted to digital
form before use in a GIS, thus the data encoding and correction procedures are longer than
those for digital data. Digital data are already in computer-readable formats and are supplied
on CD-ROM or across a computer network. Map data, aerial photographs, satellite imagery,
data from databases and automatic data collection devices (such as data loggers and GPS) are
all available in digital form.
SCANNER:
Scanning coverts paper maps into digital format by capturing features as individual
cells, or pixels, producing an automated image. Maps are generally considered the backbone
of any GIS activity. But many a time paper maps are not easily available in a form that can be
readily used by the computers. Most of the paper maps had been prepared on the basis of old
conventional surveys. New maps can be produced using improved technologies but this
requires time as it increases the volume of work. Thus, we have to resort to the available
maps. These paper maps have to be first converted into a digital format usable by the
computer. This is a critical step as the quality of the analog document must be preserved in
the transition to the computer domain.
The technology used for this kind of conversions is known as scanning and the
instrument used for this kind of operation is known as a scanner. A scanner can be thought of
as an electronic input device that converts analog information of a document like a map,
photograph or an overlay into a digital format that can be used by the computer. Scanning
automatically captures map features, text, and symbols as individual cells, or pixels, and
produces an automated image.
Working of a Scanner:
The most important component inside a scanner is the scanner head which can move
along the length of the scanner. The scanner head contains either a charged-couple device
(CCD) sensor or a contact image (CIS) sensor. A CCD consists of a number of photosensitive
cells or pixels packed together on a chip. The most advanced large format scanners use
CCD’s with 8000 pixels per chip for providing a very good image quality.
While scanning a bright white light from the scanner strikes the image to be scanned
and is reflected onto the photosensitive surface of the sensor placed on the scanner head.
Each pixel transfers a gray tone value (values given to the different shades of black in the
image ranging from 0 (black) – 255 (white) i.e. 256 values to the scan board (software). The
software interprets the value in terms of 0 (Black) or 1 (white), thereby, forming a
monochrome image of the scanned portion. As the head moves ahead, it scans the image in
tiny strips and the sensor continues to store the information in a sequential fashion. The
software running the scanner pierces together the information from the sensor into a digital
form of the image. This type of scanning is known as one pass scanning.
Scanning a colour image is slightly different in which the scanner head has to scan the
same image for three different colours i.e. red, green, blue. In older colour scanners, this was
accomplished by scanning the same area three times over for the three different colours. This
type of scanner is known as three-pass scanner. However, most of the colour scanners now
scan in one pass scanning all the three colours in one go by using colour filters. In principle, a
colour CCD works in the same way as a monochrome CCD. But in this each colour is
constructed by mixing red, green and blue. Thus, a 24-bit RGB CCD presents each pixel by
24 bits of information. Usually, a scanner using these three colours (in full 24 RGB mode)
can create up to 16.8 million colours.
Types of Scanners:
Hand-held scanners although portable, can only scan images up to about four inches
wide. They require a very steady hand for moving the scan head over the document. They are
useful for scanning small logos or signatures and are virtually of no use for scanning maps
and photographs.
The most commonly used scanner is a flatbed scanner also known as desktop scanner.
It has a glass plate on which the picture or the document is placed. The scanner head placed
beneath the glass plate moves across the picture and the result is a good quality scanned
image. For scanning large maps or top sheets wide format flatbed scanners can be used.
Then there are the drum scanners which are mostly used by the printing professionals.
In this type of scanner, the image or the document is placed on a glass cylinder that rotates at
very high speeds around a centrally located sensor containing photo-multiplier tube instead of
a CCD to scan. Prior to the advances in the field of sheet fed scanners, the drum scanners
were extensively used for scanning maps and other documents.
Vector data is not made up of grids of pixels. Instead, vector graphics are comprised
of vertices and paths. The three basic symbol types for vector data are points, lines and
polygons (areas).
Esri Shapefile .SHP, But you’ll need a complete set of three files
.DBF, that are mandatory to make up a shapefile. The
.SHX three required files are:
DIGITIZING:
Digitizing in GIS is the process of converting geographic data either from a hardcopy
or a scanned image into vector data by tracing the features. During the digitzing process,
features from the traced map or image are captured as coordinates in either point, line, or
polygon format.
The procedure followed when digitizing a paper map using a manual digitizer has the
following five stages:
Registration: The map to be digitized is fixed firmly to the table top with sticky tape.
Five or more control points are identified (usually the four corners of the map sheet
and one or more grid intersections in the middle). The geographic co-ordinates of the
control points are noted and their locations digitized by positioning the cross-hairs on
the cursor exactly over them and pressing the ‘digitize’ button on the cursor. This
sends the co-ordinates of a point on the table to the computer and stores them in a file
as ‘digitizer co-ordinates’.
Digitizing point features: Point features, for example spot heights, hotel locations or
meteorological stations, are recorded as a single digitized point. A unique code
number or identifier is added so that attribute information may be attached later. For
instance, the hotel with ID number ‘1’ would later be identified as ‘Mountain View’.
Digitizing line features: Line features (such as roads or rivers) are digitized as a series
of points that the software will join with straight line segments. In some GIS packages
lines are referred to as arcs, and their start and end points as nodes. This gives rise to
the term arc–node topology, used to describe a method of structuring line features.
Digitizing area (polygon) features: Area features or polygons, for example forested
areas or administrative boundaries, are digitized as a series of points linked together
by line segments in the same way as line features. Here it is important that the start
and end points join to form a complete area. Polygons can be digitized as a series of
individual lines, which are later joined to form areas. In this case it is important that
each line segment is digitized only once.
Adding attribute information: Attribute data may be added to digitized polygon
features by linking them to a centroid (or seed point) in each polygon. These are either
digitized manually (after digitizing the polygon boundaries) or created automatically
once the polygons have been encoded. Using a unique identifier or code number,
attribute data can then be linked to the polygon centroids of appropriate polygons. In
this way, the forest stand may have data relating to tree species, tree ages, tree
numbers and timber volume attached to a point within the polygon.
Manual digitizers may be used in one of two modes: point mode or stream mode. In
point mode the user begins digitizing each line segment with a start node, records each
change in direction of the line with a digitized point and finishes the segment with an end
node. Thus, a straight line can be digitized with just two points, the start and end nodes. For
more complex lines, a greater number of points are required between the start and end nodes.
Smooth curves are problematic since they require an infinite number of points to record their
true shape.
(a) Point mode - person digitizing decides where to place each individual point such as to most accurately
represent the line within the accepted tolerances of the digitizer.
(b) Stream mode – person digitizing decides on time or distance interval between the digitizing hardware
registering each point as the person digitizing moves the cursor along the line.
In stream mode the digitizer is set up to record points according to a stated time
interval or on a distance basis. Once the user has recorded the start of a line the digitizer
might be set to record a point automatically every 0.5 seconds and the user must move the
cursor along the line to record its shape. An end node is required to stop the digitizer
recording further points. The speed at which the cursor is moved along the line determines
the number of points recorded. Thus, where the line is more complex and the cursor needs to
be moved more slowly and with more care, a greater number of points will be recorded.
Conversely, where the line is straight, the cursor can be moved more quickly and fewer
points are recorded.
The choice between point mode and stream mode digitizing is largely a matter of
personal preference. Stream mode digitizing requires more skill than point mode digitizing,
and for an experienced user may be a faster method. Stream mode will usually generate more
points, and hence larger files, than point mode.
TOPOLOGY:
Topology is the mathematical representation of the physical relationships that exists
between the geographical elements. Topology has long been a key GIS requirement for data
management and integrity. In general, a topological data model manages spatial relationships
by representing spatial objects (point, line, and area features) as an underlying graph of
topological primitives—nodes, faces, and edges. These primitives, together with their
relationships to one another and to the features whose boundaries they represent, are defined
by representing the feature geometries in a planar graph of topological elements.
Topology is useful in GIS because many spatial modeling operations don’t require
coordinates, only topological information. For example, to find an optimal path between two
points requires a list of the arcs that connect to each other and the cost to traverse each arc in
each direction. Coordinates are only needed for drawing the path after it is calculated.
Connectivity
Connectivity is defined through arc-node topology. This is the basis for many network
tracing and path finding operations. Connectivity allows you to identify a route to the airport,
connect streams to rivers, or follow a path from the water treatment plant to a house.
In the arc-node data structure, an arc is defined by two endpoints: the from-node
indicating where the arc begins and a to-node indicating where it ends. This is called arc-node
topology.
Arc-node topology is supported through an arc-node list. The list identifies the from-
and to-nodes for each arc. Connected arcs are determined by searching through the list for
common node numbers. In the above example, it is possible to determine that arcs 1, 2, and 3
all intersect because they share node 11. The computer can determine that it is possible to
travel along arc 1 and turn onto arc 3 because they share a common node (11), but it's not
possible to turn directly from arc 1 onto arc 5 because they don't share a common node.
Containment:
Many of the geographic features that may be represented cover a distinguishable area
on the surface of the earth, such as lakes, parcels of land, and census tracts. An area is
represented in the vector model by one or more boundaries defining a polygon. Although this
sounds counterintuitive, consider a lake with an island in the middle. The lake actually has
two boundaries: one that defines its outer edge and the island that defines its inner edge. In
the terminology of the vector model, an island defines an inner boundary (or hole) of a
polygon.
The arc-node structure represents polygons as an ordered list of arcs rather than a
closed loop of x,y coordinates. This is called polygon-arc topology. In the illustration below,
polygon F is made up of arcs 8, 9, 10, and 7 (the 0 before the 7 indicates that this arc creates
an island in the polygon).
Contiguity:
Two geographic features that share a boundary are called adjacent. Contiguity is the
topological concept that allows the vector data model to determine adjacency. Polygon
topology defines contiguity. Polygons are contiguous to each other if they share a common
arc. This is the basis for many neighbor and overlay operations.
Recall that the from-node and to-node define an arc. This indicates an arc's direction
so the polygons on its left and right sides can be determined. Left-right topology refers to the
polygons on the left and right sides of an arc. In the below example, polygon B is on the left
of arc 6, and polygon C is on the right. Thus we know that polygons B and C are adjacent.
Topology Rules:
There are many topology rules you can implement in your geodatabase, depending on
the spatial relationships that are most important for your organization to maintain. You
should carefully plan the spatial relationships you will enforce on your features. Some
topology rules govern the relationships of features within a given feature class, while others
govern the relationships between features in two different feature classes or subtypes.
Topology rules can be defined between sub types of features in one or another feature class.
This could be used, for example, to require street features to be connected to other street
features at both ends, except in the case of streets belonging to the cul-de-sac or dead-end
subtypes.
Many topology rules can be imposed on features in a geodatabase. A well-designed
geodatabase will have only those topology rules that define key spatial relationships needed
by an organization. Most topology violations have fixes that you can use to correct errors.
Must Be Disjoint:
Requires that points be separated spatially from other points in the same feature class
(or subtype). Any points that overlap are errors. This is useful for ensuring that points are not
coincident or duplicated within the same feature class, such as in layers of cities, parcel lot ID
points, wells, or streetlamp poles.
rule is used when line features must form closed loops, such as when they are defining the
boundaries of polygon features. It may also be used in cases where lines typically connect to
other lines, as with streets. In this case, exceptions can be used where the rule is occasionally
violated, as with cul-de-sac or dead-end street segments.
Contains Point:
Requires that a polygon in one feature class contain at least one point from another
feature class. Points must be within the polygon, not on the boundary. This is useful when
every polygon should have at least one associated point, such as when parcels must have an
address point.
vertices. Any area defined in either feature class that is not shared with the other is an error.
This rule is used when two systems of classification are used for the same geographic area,
and any given point defined in one system must also be defined in the other. One such case
occurs with nested hierarchical datasets, such as census blocks and block groups or small
watersheds and large drainage basins. The rule can also be applied to non-hierarchically
related polygon feature classes, such as soil type and slope class.
Joins:
When our data was all in a single table, we could easily retrieve a particular row from
that table. But if the data we are looking for is available in two or more tables then joins can
be used to retrieve those data. Join is used to fetch data from two or more tables, which is
joined to appear as single set of data. It is used for combining column from two or more
tables by using values common to both tables.
There are several types of JOINs: INNER, LEFT OUTER and RIGHT OUTER; they
all do slightly different things, but the basic theory behind them all is the same.
Inner Join:
An INNER JOIN returns a result set that contains the common elements of the tables,
i.e. the intersection where they match on the joined condition. An INNER JOIN focuses on
the commonality between two tables. When using an INNER JOIN, there must be at least
some matching data between two (or more) tables that are being compared. INNER JOINs
are the most frequently used JOIN operation.
Relates:
Relates can help us to discover specific information within our data. A relate (also
called a table relate) is a property of a layer. We can create a table relate so that we can query
and select features in one layer and see all the related features in another layer or
table. Unlike joining tables, relating tables simply defines a relationship between two tables.
The associated data isn't appended to the layer's attribute table like it is with a join. Instead,
we can access the related data through selected features or records in your layer or table.
Relation Class:
A relationship class is an object in a geo-database that stores information about a
relationship between two feature classes, between a feature class and a non-spatial table, or
between two non-spatial tables. Both participants in a relationship class must be stored in the
same geo-database.
A relationship class stores information about associations among features and records
in a geo-database and can help ensure your data's integrity. Relates that are added to a layer
or table in a map are essentially the same as simple relationship classes defined in a geo-
database, except that they are saved with the map instead of in a geo-database.
application’s request so that the request conforms to syntax supported by the associated
database. A framework to easily build an ODBC drivers is available from Simba
Technologies, as are ODBC drivers for many data sources, such as Salesforce, MongoDB,
Spark and more.
The following are the steps involved in connecting application programs with the
database using ODBC API:
Load ODBC driver: The forName() method of Class class is used to register the driver
class. This method is used to dynamically load the driver class.
Establish Connection: The getConnection() method of DriverManager class is used to
establish connection with the database.
Prepare and Execute SQL Statement: The createStatement() method of Connection
interface is used to create statement. The executeQuery() and execute() method is
used to execute queries to the database.
Process the result: The executeQuery() method returns the object of ResultSet that can
be used to get all the records of a table.
Close connection: The close() method is used to close the connection in order to free
the allocated resource used by the connection.
The below java code is used for connecting with mysql database using ODBC
application programming interface.
Today, GPS receivers are included in many commercial products, such as automobiles, smart
phones, exercise watches, and GIS devices.
repositioned, so that three of the extra satellites became part of the constellation baseline. As
a result, GPS now effectively operates as a 27-slot constellation with improved coverage in
most parts of the world.
Control Segments:
The GPS control segment consists of a global network of ground facilities that track
the GPS satellites, monitor their transmissions, perform analyses, and send commands and
data to the constellation. The current Operational Control Segment (OCS) includes a master
control station, an alternate master control station, 11 command and control antennas, and 16
monitoring sites. The locations of these facilities are shown in the map above.
User Segments:
Like the Internet, GPS is an essential element of the global information infrastructure.
The free, open, and dependable nature of GPS has led to the development of hundreds of
applications affecting every aspect of modern life. GPS technology is now in everything from
cell phones and wristwatches to bulldozers, shipping containers, and ATM's.
*****
Vector Data Analysis tools - Data Analysis tools - Network Analysis - Digital Elevation
models - 3D data collection and utilisation.
The vector data model uses points and their x-, y-coordinates to construct spatial
features of points, lines, and polygons. These spatial features are used as inputs in vector data
analysis. Therefore, the accuracy of data analysis depends on the accuracy of these features in
terms of their location and shape and whether they are topological or not. Additionally, it is
important to note that an analysis may apply to all, or selected, features in a layer. The
following are the types of analysis used with vector data.
• Buffering
• Overlay
• Distance Measurement
• Pattern Analysis
• Feature Manipulation
Buffering:
Based on the concept of proximity, buffering creates two areas: one area that is within
a specified distance of select features and the other area that is beyond. The area within the
specified distance is the buffer zone. A GIS typically varies the value of an attribute to
separate the buffer zone (e.g., 1) from the area beyond the buffer zone (e.g., 0). Besides the
designation of the buffer zone, no other attribute data are added or combined.
Features for buffering may be points, lines, or polygons (refer below figure).
Buffering around points creates circular buffer zones. Buffering around lines creates a series
of elongated buffer zones around each line segment. And buffering around polygons creates
buffer zones that extend outward from the polygon boundaries.
Variations in Buffering:
The buffer distance or buffer size does not have to be constant; it can vary according
to the values of a given field (refer below figure). For example, the width of the riparian
buffer can vary depending on its expected function and the intensity of adjacent land use. A
feature may have more than one buffer zone. As an example, a nuclear power plant may be
buffered with distances of 5, 10, 15, and 20 miles, thus forming multiple rings around the
plant. Although the interval of each ring is the same at 5 miles, the rings are not equal in area.
The second ring from the plant, in fact, covers an area about three times larger than the first
ring. One must consider this area difference if the buffer zones are part of an evacuation plan.
Likewise, buffering around line features does not have to be on both sides of the lines; it can
be on either the left side or the right side of the line feature. (The left or right side is
determined by the direction from the starting point to the end point of a line.) Likewise,
buffer zones around polygons can be extended either outward or inward from the polygon
boundaries. Boundaries of buffer zones may remain intact so that each buffer zone is a
separate polygon for further analysis.
Applications of Buffering:
Most applications of buffering are based on buffer zones. A buffer zone is often
treated as a protection zone and is used for planning or regulatory purposes:
Government regulations may set 2-mile buffer zones along streams to minimize
sedimentation from logging operations.
A national forest may restrict oil and gas well drilling within 500 feet of roads or
highways.
A planning agency may set aside land along the edges of streams to reduce the effects
of nutrient, sediment, and pesticide runoff; to maintain shade to prevent the rise of
stream temperature; and to provide shelter for wildlife and aquatic life
A planning agency may create buffer zones around geographic features such as water,
wetlands, critical habitats, and wells to be protected and exclude these zones from
landfill consideration.
Sometimes buffer zones may represent the inclusion zones in GIS applications. For
example, the criteria for an industrial park may stipulate that a potential site must be
within 1 mile of a heavy-duty road. In this case, the 1-mile buffer zones of all heavy-
duty roads become the inclusion zones.
Buffer zones can also be used as indicators of the positional accuracy of point and line
features. This application is particularly relevant for historical data that do not include
geographic coordinates or data that are generated from poor-quality sources.
OVERLAY:
An overlay operation combines the geometries and attributes of two feature layers to
create the output. The geometry of the output represents the geometric intersection of features
from the input layers. Below figure illustrates an overlay operation with two simple polygon
layers. Each feature on the output contains a combination of attributes from the input layers,
and this combination differs from its neighbours. Feature layers to be overlaid must be
spatially registered and based on the same coordinate system.
Overlay combines the geometries and attributes from two layers into a single layer.
The dashed lines are for illustration only and are not included in the output.
Point-in-polygon overlay. The input is a point layer. The output is also a point layer
but has attribute data from the polygon layer.
In a line-in-polygon overlay operation, the output contains the same line features as in
the input layer but each line feature is dissected by the polygon boundaries on the overlay
layer. Thus the output has more line segments than does the input layer. Each line segment on
the output combines attributes from the input layer and the underlying polygon. For example,
a line-in-polygon overlay can find soil data for a proposed road. The input layer includes the
proposed road. The overlay layer contains soil polygons.
Line-in-polygon overlay. The input is a line layer. The output is also a line layer.
Polygon-on-polygon overlay. In the illustration, the two layers for overlay have the
same area extent. The output combines the geometries and attributes from the two
layers into a single polygon layer.
Overlay Methods:
The overlay methods are based on the Boolean connectors AND, OR, and XOR.
Intersect uses the AND connector. Union uses the OR connector. Symmetrical Difference or
Difference uses the XOR connector. Identity or Minus uses the following expression: [(input
layer) AND (identity layer)] OR (input layer). The following explains in more detail these
four common overlay methods by using two polygon layers as the inputs.
Union preserves all features from the inputs. The area extent of the output combines
the area extents of both input layers.
Intersect preserves only those features that falls within the area extent common to the
inputs. Intersect is often a preferred method of overlay because any feature on its output has
attribute data from both of its inputs. For example, a forest management plan may call for an
inventory of vegetation types within riparian zones. Intersect will be a more efficient overlay
method than Union in this case because the output contains only riparian zones with
vegetation types.
The Union method keeps all the areas of the two input The Intersect method preserves only the area common
layers in the output. to the two input layers in the output.
Symmetrical Difference preserves features that fall within the area extent that is common to
only one of the inputs. In other words, Symmetrical Difference is opposite to Intersect in
Identity preserves only features that fall within the area extent of the layer defined as
the input layer. The other layer is called the identity layer.
The Symmetrical Difference method preserves areas The Identity method produces an output that has the
common to only one of the input layers in the output. same extent as the input layer. But the output includes
the geometry and attributes data from the identity
layer.
Applications of Overlay:
The overlay methods play a central role in many querying and modeling applications.
Suppose an investment company is looking for a land parcel that is zoned commercial, not
subject to flooding, and not more than 1 mile from a heavy-duty road. The company can first
create the 1-mile road buffer and overlay the buffer zone layer with the zoning and floodplain
layers. A subsequent query of the overlay output can select land parcels that satisfy the
A more specific application of overlay is to help solve the areal interpolation problem.
Areal interpolation involves transferring known data from one set of polygons (source
polygons) to another (target polygons). For example, census tracts may represent source
polygons with known populations in each tract from the U.S. Census Bureau, and school
districts may represent target polygons with unknown population data. Using overlay the
population of the school districts can be calculated using the population given in the census
tract.
Distance Measurement:
Distance measurement refers to measuring straight line (Euclidean) distances between
features. Measurements can be made from points in a layer to points in another layer, or from
each point in a layer to its nearest point or line in another layer. In both cases, distance
measures are stored in a field. Distance measures can be used directly for data analysis.
Pattern Analysis:
Pattern analysis is the study of the spatial arrangements of point or polygon features in
two dimensional space. Pattern analysis uses distance measurements as inputs and statistics
(spatial statistics) for describing the distribution pattern. At the general (global) level, a
pattern analysis can reveal if a point distribution pattern is random, dispersed, or clustered
A classic technique for point pattern analysis, nearest neighbour analysis uses the
distance between each point and its closest neighbouring point in a layer to determine if the
point pattern is random, regular, or clustered. The nearest neighbour statistic is the ratio (R)
of the observed average distance between nearest neighbours (d obs) to the expected average
for a hypothetical random distribution (dexp):
The R ratio is less than 1 if the point pattern is more clustered than random, and
greater than 1 if the point pattern is more dispersed than random.
Feature Manipulation:
Tools are available in a GIS package for manipulating and managing features in one
or more feature layers. When a tool involves two layers, the layers must be based on the same
coordinate system. Like overlay, these feature tools are often needed for data preprocessing
and data analysis; however, unlike overlay, these tools do not combine geometries and
attributes from input layers into a single layer.
Dissolve aggregates features in a feature layer that have the same attribute value or
values. For example, we can aggregate roads by highway number or counties by state. An
important application of Dissolve is to simplify a classified polygon layer.
Dissolve removes boundaries of polygons that have Clip creates an output that contains only those
the same attribute value in (a) and creates a features of the input layer that fall within the area
simplified layer (b). extent of the clip layer. The output has the same
feature type as the input.
Clip creates a new layer that includes only those features of the input layer, including
their attributes that fall within the area extent of the clip layer. Clip is a useful tool, for
example, for cutting a map acquired elsewhere to fit a study area.
Append creates a new layer by piecing together two or more layers, which represent
the same feature and have the same attributes. For example, Append can put together a layer
from four input layers, each corresponding to the area extent of a USGS 7.5-minute
quadrangle.
Append pieces together two adjacent layers into a Select creates a new layer (b) with selected features
single layer but does not remove the shared from the input layer (a).
boundary between the layers.
Select creates a new layer that contains features selected from a user-defined query
expression. For example, we can create a layer showing high-canopy closure by selecting
stands that have 60 to 80 percent closure from a stand layer.
Eliminate creates a new layer by removing features that meet a user-defined query
expression. For example, Eliminate can implement the minimum mapping unit concept by
removing polygons that are smaller than the defined unit in a layer.
Update uses a “cut and paste” operation to replace the input layer with the update
layer and its features. As the name suggests, Update is useful for updating an existing layer
with new features in limited areas.
Update replaces the input layer with the update Erase removes features from the input layer that fall
layer and its features. within the area extent of the erase layer.
Erase removes from the input layer those features that fall within the area extent of
the erase layer. Suppose a suitability analysis stipulates that potential sites cannot be within
300 meters of any stream. A stream buffer layer can be used in this case as the erase layer to
remove itself from further consideration.
Split divides the input layer into two or more layers. A split layer, which shows area
subunits, is used as the template for dividing the input layer. For example, a national forest
can split a stand layer by district so that each district office can have its own layer.
In contrast with vector data analysis, which uses points, lines, and polygons, raster
data analysis uses cells and rasters. Raster data analysis can be performed at the level of
individual cells, or groups of cells, or cells within an entire raster. Some raster data operations
use a single raster; others use two or more rasters.
Local Operations:
Constituting the core of raster data analysis, local operations are cell-by-cell
operations. A local operation can create a new raster from either a single input raster or
multiple input rasters. The cell values of the new raster are computed by a function relating
the input to the output or are assigned by a classification table.
Reclassification:
A local operation, reclassification creates a new raster by classification.
Reclassification is also referred to as recoding, or transforming, through lookup tables. Two
reclassification methods may be used. The first method is a one-to-one change, meaning that
a cell value in the input raster is assigned a new value in the output raster. For example,
irrigated cropland in a land-use raster is assigned a value of 1 in the output raster. The second
method assigns a new value to a range of cell values in the input raster. For example, cells
with population densities between 0 and 25 persons per square mile in a population density
raster are assigned a value of 1 in the output raster and so on. An integer raster can be
reclassified by either method, but a floating-point raster can only be reclassified by the
second method.
Reclassification serves three main purposes. First, reclassification can create a
simplified raster. For example, instead of having continuous slope values, a raster can have 1
for slopes of 0 to 10 percent, 2 for 10 to 20 percent, and so on. Second, reclassification can
create a new raster that contains a unique category or value such as slopes of 10 to 20
percent. Third, reclassification can create a new raster that shows the ranking of cell values in
the input raster. For example, a reclassified raster can show the ranking of 1 to 5, with 1
being least suitable and 5 being most suitable.
The cell value in (d) is the mean calculated from three input rasters (a, b, and c) in a local
operation. The shaded cells have no data.
For example, above figure shows a local operation that calculates the mean from three input
rasters. If a cell contains no data in one of the input rasters, the cell also carries no data in the
output raster by default.
Some local operations do not involve statistics or computation. A local operation
called Combine assigns a unique output value to each unique combination of input values.
Suppose a slope raster has three cell values (0 to 20 percent, 20 to 40 percent, and greater
than 40 percent slope), and an aspect raster has four cell values (north, east, south, and west
aspects).
Each cell value in (c) represents a unique combination of cell values in (a) and (b). The
combination codes and their representations are shown in (d).
The Combine operation creates an output raster with a value for each unique
combination of slope and aspect, such as 1 for greater than 40 percent slope and the south
aspect, 2 for 20 to 40 percent slope and the south aspect, and so on
Neighborhood Operations:
A neighborhood operation, also called a focal operation, involves a focal cell and a set
of its surrounding cells. The surrounding cells are chosen for their distance and/or directional
relationship to the focal cell. A required parameter for neighborhood operations is the type of
neighborhood. Common neighbourhoods include rectangles, circles, annuluses, and wedges.
Four common neighborhood types: rectangle (a),circle (b), annulus (c), and wedge (d).
The cell marked with an x is the focal cell.
A rectangle is defined by its width and height in cells, such as a 3-by-3 area centered
at the focal cell. A circle extends from the focal cell with a specified radius. An annulus or
doughnut-shaped neighborhood consists of the ring area between a smaller circle and a larger
circle centered at the focal cell. And a wedge consists of a piece of a circle centered at the
focal cell.
Zonal Operations:
A zonal operation works with groups of cells of same values or like features. These
groups are called zones. Zones may be contiguous or noncontiguous. A contiguous zone
includes cells that are spatially connected, whereas a non-contiguous zone includes separate
regions of cells. A watershed raster is an example of a contiguous zone, in which cells that
belong to the same watershed are spatially connected. A land use raster is an example of a
non-contiguous zone, in which one type of land use may appear in different parts of the
raster.
Given two rasters in a zonal operation, one input raster and one zonal raster, a zonal
operation produces an output raster, which summarize the cell values in the input raster for
each zone in the zonal raster. The summary statistics and measures include area, minimum,
maximum, sum, range, mean, standard deviation, median, majority, minority, and variety.
NETWORK:
A network is a system of linear features that has the appropriate attributes for the flow
of objects. A road system is a familiar network. Other networks include railways, public
transit lines, bicycle paths, and streams. A network is typically topology-based: lines meet at
intersections, lines cannot have gaps, and lines have directions.
A link refers to a road segment defined by two end points. Also called edges or arcs,
links are the basic geometric features of a network. Link impedance is the cost of traversing a
link. A simple measure of the cost is the physical length of the link. But the length may not
be a reliable measure of cost, especially in cities where speed limits and traffic conditions
vary significantly along different streets.
A junction refers to a street intersection. A junction is also called a node. A turn is a
transition from one street segment to another at a junction. Turn impedance is the time it
takes to complete a turn, which is significant in a congested street network. Turn impedance
is directional.
NETWORK ANALYSIS:
A network with the appropriate attributes can be used for a variety of applications.
Some applications are directly accessible through GIS tools. Others require the integration of
GIS and specialized software in operations research and management science.
2.It chooses a vertex (the source) and assigns a maximum possible cost (i.e. infinity) to
every other vertex.
3.The cost of the source remains zero as it actually takes nothing to reach from the
source vertex to itself.
4.In every subsequent step of the algorithm it tries to improve(minimize) the cost for
each vertex. Here the cost can be distance, money or time taken to reach that vertex from
the source vertex. The minimization of cost is a multi-step process.
a. For each unvisited neighbor (vertex 2, vertex 3, vertex 4) of the current vertex
(vertex 1) calculate the new cost from the vertex (vertex 1).
b. For e.g. the new cost of vertex 2 is calculated as the minimum of the two (
(existing cost of vertex 2) or (sum of cost of vertex 1 + the cost of edge from
vertex 1 to vertex 2) )
5. When all the neighbors of the current node are considered, it marks the current node
as visited and is removed from the unvisited list.
6. Select a vertex from the list of unvisited nodes (which has the smallest cost) and
repeat step 4.
7. At the end there will be no possibilities to improve it further and then the algorithm
ends
Example:
1 2
3 4
5 6
Closest Facility:
Closest facility is a network analysis that finds the closest facility among candidate
facilities to any location on a network. The analysis first computes the shortest paths from the
select location to all candidate facilities, and then chooses the closest facility among the
candidates. A couple of options may be applied to the closest facility problem. First, rather
than getting a single facility, the user may ask for a number of closest facilities. Second, the
user may specify a search radius in distance or travel time, thus limiting the candidate
facilities.
Allocation:
Allocation is the study of the spatial distribution of resources through a network.
Resources in allocation studies often refer to public facilities, such as fire stations, schools,
hospitals, and even open spaces (in case of earthquakes). Because the distribution of the
resources defines the extent of the service area, the main objective of spatial allocation
analysis is to measure the efficiency of these resources.
Optical Sensors:
To make DEMs, two or more optical satellite images of the same area taken from
different directions are needed. These stereo images should be taken within a short time
interval so that their spectral signatures do not differ significantly. Two optical sensors that
readily meet the requirement are Terra ASTER and SPOT 5. ASTER provides a nadir view
and a backward view within a minute, and the HRS (High Resolution Sensor) carried on
SPOT 5 provides a forward view and a backward view along its orbit. ASTER DEMs have a
spatial resolution of 30 meters. Airbus Defence and Space distributes SPOT 5 DEMs with a
spatial resolution of 20 meters. DEMs can also be generated from very high resolution
satellite images such as World-View images as long as stereo pairs are available
InSAR:
InSAR uses two or more SAR images to generate elevations of the reflective surface,
which may be vegetation, man-made features, or bare ground. SRTM (Shuttle Radar
Topography Mission) DEMs, for example, are derived from SAR data collected by two radar
antennas placed on the Space Shuttle in 2000. SRTM DEMs cover over 80 percent of the
landmass of the Earth between 60° N and 56° S.
LiDAR:
The basic components of a LiDAR system include a laser scanner mounted in an aircraft,
GPS, and an Inertial Measurement Unit (IMU). The laser scanner has a pulse generator,
which emits rapid laser pulses (0.8 — 1.6 μm wavelength) over an area of interest, and a
receiver, which gets scattered and reflected pulses from targets. Using the time lapse of the
pulse, the distance (range) between the scanner and the target can be calculated. At the same
time, the location and orientation of the aircraft are measured by the GPS and IMU,
respectively. The target location in a three dimensional space can therefore be determined by
using the information obtained by the LiDAR system. A major application of LiDAR
technology is the creation of high resolution DEMs, with a spatial resolution of 0.5 to 2
meters.
City Planning:
environmental and cultural results of activities along the coast. The right data makes all the
difference in sustainably performing operations like construction or excavation.
*****