10. Types of attribute data
10. Types of attribute data
Structure_____________________________________________________
11.1 Introduction 11.5 Data Integration
Expected Learning Outcomes Methods of Integration
11.2 Related Terminologies 11.6 Summary
11.3 Data Linkages 11.7 Activity
Linking Non-spatial Data with Spatial Data 11.8 Terminal Questions
11.4 Attribute Data Management 11.9 References
Non-spatial Data Structure 11.10 Further/Suggested Readings
11.11 Answers
11.1 INTRODUCTION
GIS comprises of spatial and attribute data. Spatial data describes the geometries of spatial
features whereas attribute data defines the characteristics of the spatial features. The spatial data is
incomplete without the non-spatial or attribute data. For any GIS application and its further analysis
is generally based upon attributes information. Therefore, no GIS will work in the absence of non-
spatial data. Besides, these databases are needed to be integrated together for further viewing,
analysis and obtaining results in a GIS environment.
In the previous unit you have read about types of data and its conversion, creation of new data,
transformation of data. In this unit, we will discuss related terminologies of attribute data
management and data linkages including linking non spatial data with spatial data. In addition, we
will also discuss attribute data management and data integration covering various methods of data
integration.
Block 3 GIS Database Creation
……………………..…………………………..................…………………....…………….…………………………………
Fig. 11.1: Arc with street segment in the polygon files has a set of associated
attributes. These attributes include street name, address ranges on the
left side and the right side, as well as PIN codes on both sides.
Feature Attribute Table: A feature attribute table has access to the spatial
vector data. In the georelational data model, the feature attribute table uses the
feature ID for the each feature to link to the geometry of that feature (Table
11.1). For the object-based data model, each feature attribute table has a
defined field that stores the geometry of a feature (Table 11.2).
Table 11.1: An example of the georelational data model where the soils
coverage uses LU/LC Class to link to the spatial and attribute
data.
Table 11.2: The object-based data model uses the shape field to store the
geometry of building polygons. The table therefore contains
both spatial and attribute data.
Types of Attribute Data: There are two methods for classifying attribute data.
The first method followed for classifying attribute data is by data type. In this
method common data types used are number, text (or character), date, and
binary large object (BLOB). The second method used to classify attribute data
is by measurement scale. The measurement scale concept groups attribute
data into nominal, ordinal, interval, and ratio data, with increasing degrees of
sophistication.
Value Attribute Table: It is the value of the cell that comes into numeric
number and each cell number is represented with its frequency (number of
count to the particular cell) (Table 11.3). For example, in raster file format data,
it has a value attribute table, which lists the cell values and their
frequencies/count in integer raster of land use/land cover feature. A value
attribute table differs from the feature attribute. A feature attribute table
consists of rows and columns. Each row represents a spatial feature, and each
column represents a property or characteristic of the spatial feature (Fig. 11.1
and Table 11.4).
Table 11.3: A value attribute table lists the attributes of value and count.
The value field refers to the cell value, and the count field
refers to the number of cells.
Data capture: There are different means to capture data for a computer. It may
be captured either by collecting document to be typed in, making measurement
and keying it and/or asking people to fill in questionnaires or listing information
of the measurable events. Data may also be collected directly by an input
device without using a key board, such as bar code reader, scanning picture,
using sensors for data logging. Data as attributes can be entered by direct data
loggers, manual keyboard entry, optical character recognition (OCR) or,
increasingly, voice recognition, etc. (Fig. 11.2).
production for each state and another is on food grains export. You must
combine these two files to solve the problem. Once the files are combined then
it becomes simple for computer to process. The production and export situation
may be seen spatially on map once these data are linked to the state boundary
map. Hence, data linkage is very useful to determine location, conditions,
trends, patterns and modelling.
11.3.1 Linking Non Spatial Data with Spatial Data
Linking non-spatial (attribute) data with spatial data serves multiple purposes.
Firstly, keeping the non-spatial data separate and joining for analysis and
mapping allows the spatial data to handle easily. Secondly, many tables
containing non-spatial database may be joined together making the retrieval
faster and maintaining the database redundancy free.
In GIS language, the logical linking of attribute or external data is called ‘relate’
and appending of attribute data is termed as ‘append or join’
(http://webhelp.esri.com). When the data is permanently joined, (e.g., with the
change in property), the data needs to be updated in the map itself. The
temporary join is saved in terms of link only in the project file with various
formats or terms used by various software like .apr and .mxd in ArcView old or
new version, .wor in MapInfo, etc. In this case, with the change in property,
whenever the data in table is updated, the map also gets updated.
The basis of data joining between and among various tables or between spatial
and non-spatial data is the common identifier known as ‘primary key’ and
‘foreign key’. This identifier is unique in both the files like a unique code which
is not repeatable anywhere in the file. Sometimes, with smaller database,
names are taken as common identifier but in large database there may be
several identical names with different identities. Thus, for each feature unique
code is assigned to avoid this kind of confusion. For example, in spatial terms
there may be two villages of same names in a block or taluka. If one wants to
join the population data of those all the villages with the village map, there will
be an error in joining the data to the corresponding village of same name. In
larger context, the same district names exist in two states in India for example,
Bilaspur exists in Himachal Pradesh and Chhattisgarh States, Hamirpur district
exists in Himachal Pradesh and Uttar Pradesh states, Aurangabad district
exists in Bihar and Maharashtra States, and Pratapgarh district exists in Uttar
Pradesh and Rajasthan States. One example of this can be seen in Fig. 11.3.
At lower level, like Block, Panchayat and Villages, there are large numbers of
repetitions of names. In this case, data of one spatial unit (district, block,
panchayat or village) may be linked to another one having same name. For
instance, data of Aurangabad district of Bihar may go to the Aurangabad district
of Maharashtra and vice-versa after joining, if there is no unique identity for
each district in the spatial data (map) as well as non-spatial (attribute or table)
data. And for this reason, unique identity is required for data joining. Example of
different spatial units is given in Fig. 11.4.
Fig. 11.3: Map of India showing districts with same name “Bilaspur” in Himachal
Pradesh and Chhattisgarh with different spatial locations.
Fig 11.4: Non-spatial and spatial data joining based on unique identities.
This way, various small attribute tables containing specific details of the spatial
features are joined with spatial features using the unique but common
identities, where a common field like DIST_ID in above example is essential in
all attribute tables for data joining as an identifier. This relates the various
tables or attribute table (non-spatial data) with spatial data to the exact feature.
SAQ 1
a) Define DBMS.
b) What is attribute data?
c) What is the difference between data relate and join?
The data creation standards are still not uniform all across the globe. As a
consequence, data is created in different standards like coverage or
boundaries, time period, data structure, different formats, types, accuracy levels
of various kinds, etc. as per the needs. It is created from various sources and
by using various methods.
In data integration, data from different sources or of different standards are
standardised to make compatible to place one over other with matching
locations and boundaries for spatial analysis and mapping. The major
conversion in this case is done for coordinate and projection system and scale.
The other standards are conceptual or logical. For example, the other
standards like the land use data of one part of the region is done for three level
classification and the other part is level four. In this case, both the parts need to
be brought into the same level by either expanding or reducing one class for
uniformity to integrate and work.
Contributor: Dr. Shashi Kumar and Prof. Vijay Kumar Baraik
73
Block 3 GIS Database Creation
……………………..…………………………..................…………………....…………….…………………………………
The above horizontal and vertical data integration is done for the attribute data
also.
b) Spatial and Non-Spatial Data Integration: It is also referred to data
linkages, which we have already discussed in the sub-section 11.3.1 of this
Unit. Attribute data attachment with spatial data is also data integration
making compatible as per the spatial units where the number of features will
be equal to the number of rows in any table for complete matching all the
units and rows. Mismatch of units or rows in terms of identifier will leave the
incomplete integration of non-spatial data into the spatial data.
While doing integration there are some conversion principles that should be
taken into account which are considered stepwise as listed below:
the conversion of data from analogue to digital form
all the data are converted into the same data format like .shp of arcview
before integrating
coverage or boundaries and locations are matched for the integration, if the
data are of non-matching pairs
temporal matching is also taken care of before integration. For example, a
data is of 1991 and other data is of 2001, both need to be of identical time
based on their nature and applications
standardisation of projection and reference system, which is prerequisite for
all data sets to be integrated together. If the various data layers of same
area are prepared in different projections and reference systems, the
seamless integration is not possible due to different transformation models,
and
scale and accuracy of various data layers are also required to be
standardised for integration.
SAQ II
a) What do you mean by data integration?
b) Define vertical integration.
c) What are conventional data models?
d) Define relational data.
11.6 SUMMARY
Let us summarise what you have studied in this unit.
Data are observations or measurements (unprocessed or processed)
represented as text, numbers, or multimedia. Dataset is a structured
collection of data generally associated with a unique body of work.
Database management System (DBMS) is a software package that allows
for the creation, storage, maintenance, manipulation, and retrieval of large
datasets that are distributed over one or more files. Attribute data are data
stored in a table. And the table is in the form of rows and columns to store
data.
In GIS sense, a data link connects data from different sets. Linking non-
spatial (attribute) data with spatial data serves multiple purposes. In GIS
language, the logical linking of attribute or external data is called ‘relate’ and
appending of attribute data is termed as ‘append or join’
The basis of data joining between and among various tables or between
spatial and non-spatial data is the common identifier known as ‘primary key’
and ‘foreign key’. This identifier is unique in both the files like a unique code
which is not repeatable anywhere in the file.
11.7 ACTIVITY
1. Check the hard disk of your computer and see how files are stored in
different folders and subfolders.
2. Visit any drug shop and observe how they arrange the medicines in different
stacks or shelf and relate this data in a GIS.
11.9 REFERENCES
Buckley, D. J. (1997) The GIS Primer: An Introduction to Geographic
Information System. GIS Solutions Inc. (http://www.innovativegis.com/
basis/primer/The_GIS_Primer_Buckley.pdf).
Chang, K.-T. (2010) Geographic Information System, Tata McGraw-Hill,
New Delhi.
Codd, E. (1970) A relational model for large shared data banks.
Communications of the Association for Computing Machinery, vol 13, no 6,
pp 377-87.
Flowerdew, R. (1991) Spatial Data Integration. In: Maguire, D. J., Goodchild,
M. F. and Rhind, D. W. (Eds.). Geographical Information Systems: Principles
and Applications. Vol. 1 (pp. 375-387), Longman Scientific and Technical,
London.
Montegomery, G. E. and Schuch, H. C. (1993) GIS Conversion Handbook.
GIS World Inc., Fort Collings.
Contributor: Dr. Shashi Kumar and Prof. Vijay Kumar Baraik
77
Block 3 GIS Database Creation
……………………..…………………………..................…………………....…………….…………………………………
11.11 ANSWERS
SAQ I
a) DBMS is defined as a software package that allows for the creation, storage,
maintenance, manipulation, and retrieval of large datasets that are
distributed over one or more files.
b) Attribute data are the data stored in a table. And the table is in the form of
rows and columns to store data.
c) The difference between data relate and join is the temporary joining of
attribute data is called relate and permanent joining of attribute data is
termed as append or join.
SAQ II
a) Data integration is a process of bringing all the data sets from various
sources into one platform.
b) Vertical integration is a process to keep all the databases of common area
one over other with matching all points or locations in all the data layers.
c) The conventional database models are the models such as relational,
network, hierarchical and object-oriented.
d) Relational data are the data organised by records in relations that resemble
a table.
Terminal Questions
1. Please refer to section 11.2.
2. Please refer to section 11.3.
3. Please refer to subsection 11.4.1.
4. Please refer to section 11.5.