0% found this document useful (0 votes)
4 views40 pages

Module 2

The document provides an overview of spatial data models and database structures, including relational and object-oriented databases, their components, and types. It discusses the importance of entities in database management systems (DBMS), entity sets, and the significance of ER diagrams for visualizing database relationships. Additionally, it outlines data modeling techniques and the different types of data models, emphasizing their roles in organizing and managing data effectively.

Uploaded by

wanegaonaditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views40 pages

Module 2

The document provides an overview of spatial data models and database structures, including relational and object-oriented databases, their components, and types. It discusses the importance of entities in database management systems (DBMS), entity sets, and the significance of ER diagrams for visualizing database relationships. Additionally, it outlines data modeling techniques and the different types of data models, emphasizing their roles in organizing and managing data effectively.

Uploaded by

wanegaonaditya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Module-2

Spatial Data Models;

Database Structures – Relational, Object Oriented – Entities – ER diagram - data models -


conceptual, logical and physical models - spatial data models – Raster Data Structures –
Raster Data Compression - Vector Data Structures - Raster vs Vector Models- TIN and GRID
data models.

1.1. Database Structures


A Database Management System (DBMS) is software that allows users to define, store,
maintain, and manage data in a structured and efficient manner. It acts as an intermediary
between data and users, simplifying the complexity of data processing by providing tools to
organize data, ensure its integrity, and prevent unauthorized access or data loss1.

Components of a DBMS

A DBMS consists of several key components:

1. Query Processor: Interprets and executes user requests. It includes: DML Compiler:
Processes Data Manipulation Language (DML) statements into low-level instructions.
DDL Interpreter: Processes Data Definition Language (DDL) statements into metadata.
Embedded DML Pre-compiler: Converts DML statements embedded in an application
into procedural calls. Query Optimizer: Executes instructions generated by the DML
Compiler^1^.
2. Storage Manager: Provides an interface between the data stored in the database and the
queries received. It includes: Authorization Manager: Ensures role-based access control.
Integrity Manager: Checks integrity constraints when the database is modified.
Transaction Manager: Controls concurrent access and ensures database consistency. File
Manager: Manages file space and data structures. Buffer Manager: Manages cache
memory and data transfer between secondary storage and main memory^1^.
3. Disk Storage: Physical storage for data, data dictionaries, and indices. It includes: Data
Files: Store the actual data. Data Dictionary: Contains information about the structure of
database objects. Indices: Provide faster retrieval of data items^1^.

Levels of DBMS Architecture

The structure of a DBMS can be divided into three main levels:


1. Internal Level: Represents the physical storage of data. It deals with data compression,
indexing, and storage allocation1.
2. Conceptual Level: Represents the logical view of the database, defining the data schema,
tables, attributes, and their relationships1.
External Level: Represents the user's view of the database, providing tailored views or
interfaces to meet the needs of specific user groups

Types of Databases

Different types of databases have evolved to meet various needs:


1. Relational Databases: Use tables to organize well-structured data. Examples include
MySQL, PostgreSQL, and SQLite2.
2. NoSQL Databases: Offer alternatives for data that doesn't fit the relational paradigm.
Types include: Key-Value Stores: Store data as key-value pairs (e.g., Redis). Document
Databases: Store data in structured formats like JSON (e.g., MongoDB). Graph
Databases: Focus on relationships between data (e.g., Neo4j). Column-Family Databases:
Use flexible columns to bridge the gap between relational and document databases (e.g.,
Cassandra).
3. NewSQL Databases: Combine the scalability of NoSQL with the consistency of
relational databases (e.g., CockroachDB)

1.2. Relational Database vs. Object-Oriented Database

The realm of database management is a vast landscape with a myriad of methodologies, all
finely tuned to cater to specific demands of data storage and retrieval. At the forefront of this
intricate domain stand two prevailing paradigms: relational databases and object-oriented
databases, each carrying its own distinctive ethos. In this exploration, we unearth the
fundamental disparities that set these two models apart, aiming to illuminate the strategic
choices one must make when discerning between them.

Understanding Relational Databases

Rooted in the ingenious relational model introduced by E.F. Codd, relational databases
establish their foundation. Operating within the realm of tables, these databases utilize
a meticulously organized framework where every table comprises rows, embodying
individual records, and columns, housing distinct attributes. The seamless interweaving
of data takes place through the careful orchestration of relationships, anchored by
primary and foreign keys. Functioning as the lingua franca of data manipulation,
Structured Query Language (SQL) emerges as the steadfast companion, enabling the
seamless querying and manipulation of data within the realms of relational databases.
Embracing Object-Oriented Databases

Conversely, object-oriented databases trace their origins to the immersive object-oriented


paradigm ingrained within programming languages. In this dynamic schema, data finds its
abode in the form of objects, seamlessly fusing data attributes with the methods that
intricately manipulate them. This visionary approach resonates with the process of modeling
real-world entities and the intricate interplay of their relationships, mirroring the artistry of
object modeling in programming. Within the confines of these databases, the symphony
continues with object query languages, which deftly choreograph the dance of data retrieval,
rendering it an experience that resonates harmoniously with the object-oriented philosophy.

Aspect Relational Databases Object-Oriented Databases

Data Data represented as tables with Data stored as objects with attributes
Representation fixed columns and data types and methods

Schemas are rigid, requiring Schemas are more flexible, allowing


Schema
predefined structures for tables object modification without altering
Flexibility
and relationships the schema

Suited for structured data with Ideal for complex data structures
Complexity
well-defined relationships with dynamic relationships

Primarily use SQL for querying Use object query languages for
Query Language
data navigating and querying objects

Excel in scenarios with complex


Efficient for structured data and
Scalability relationships and inheritance
predefined queries
hierarchies

Choosing between a relational database and an object-oriented database involves a thoughtful


assessment of your project’s unique demands. The following elaboration of use cases can
guide your decision-making process:
Relational Databases:

Best Suited For: Applications with well-defined data structures and a consistent need for
structured querying.

 Structured Data: When your data adheres to a clearly defined schema, and relationships
between entities are straightforward and relatively stable, a relational database shines;
 Transactional Systems: Relational databases excel in scenarios that involve frequent
transactions and require the ACID (Atomicity, Consistency, Isolation, Durability) properties
for data integrity;
 Reporting and Analytics: If your application demands comprehensive reporting and complex
analytical queries, relational databases offer robust support through SQL capabilities.
Object-Oriented Databases

Best Suited For: Complex data models, intricate relationships, and scenarios where data
structures evolve dynamically.

 Navigating Uncharted Data Territories: For data that defies conformity to rigid table
structures, object-oriented databases emerge as a haven. They gracefully accommodate
unstructured or semi-structured data, presenting an organic avenue to represent the intricacies
of multifaceted relationships and diverse attributes;
 Aligned with Object-Oriented Paradigms: Should your application be woven using the
threads of object-oriented programming, an object-oriented database effortlessly falls in step
with your design principles. The seamless harmony between these components fosters an
environment where your application’s architecture remains coherent and consistent;
 Capturing Elaborate Hierarchies and Networks: In the realm of data relationships
characterized by hierarchies, inheritance, or intricate networks, object-oriented databases
wield an innate prowess. Their design accommodates the symphony of complexities inherent
in these structures, offering an intuitive and nuanced representation;
 Dynamic Flexibility in a Shifting Landscape: When your application stands as a malleable
creation, constantly undergoing adaptation and metamorphosis in its data structures, object-
oriented databases emerge as a steadfast ally. Their elasticity accommodates the fluid nature
of your application, allowing modifications to weave seamlessly into the database fabric.
1.3 Entity in DBMS



Database management systems (DBMS) are large, integrated collections of data. They play
an important role in modern data management, helping agencies keep, retrieve, and
manage data effectively. At the core of any DBMS is the concept of entities, which is a
basic concept that refers to real-world devices or ideas inside a database. This article will
explore the sector of entities within a DBMS, providing an in-depth understanding of this
fundamental concept and its significance in database format.

Entity
An entity is a "thing" or "object" in the real world. An entity contains attributes, which
describe that entity. So anything about which we store information is called an entity.
Entities are recorded in the database and must be distinguishable, i.e., easily recognized
from the group.
For example: A student, An employee, or bank a/c, etc. all are entities.

Entity Set
An entity set is a collection of similar types of entities that share the same attributes.
For example: All students of a school are a entity set of Student entities.
Key Terminologies used in Entity Set:
 Attributes: Attributes are the houses or traits of an entity. They describe the data that
may be connected with an entity.
 Entity Type: A category or class of entities that share the same attributes is referred
to as an entity kind.
 Entity Instance: An entity example is a particular incidence or character entity within
an entity type. Each entity instance has a unique identity, often known as the number
one key.
 Primary Key: A primary key is a unique identifier for every entity instance inside an
entity kind.

It can be classified into two types:

Strong Entity Set


Strong entity sets exist independently and each instance of a strong entity set has a unique
primary key.
Example of Strong Entity includes:
 Car Registration Number
 Model
 Name etc.

Weak Entity Set


A weak entity cannot exist on its own; it is dependent on a strong entity to identify it. A
weak entity does not have a single primary key that uniquely identifies it; instead, it has a
partial key.
Example of Weak Entity Set includes:
 Laptop Color
 RAM, etc.
Kinds of Entities

There are two types of Entities:

Tangible Entity
 A tangible entity is a physical object or a physical thing that can be physically
touched, seen or measured.
 It has a physical existence or can be seen directly.
 Examples of tangible entities are physical goods or physical products (for example,
"inventory items" in an inventory database) or people (for example, customers or
employees).

Intangible Entity
 Intangible entities are abstract or conceptual objects that are not physically present but
have meaning in the database.
 They are typically defined by attributes or properties that are not directly visible.
 Examples of intangible entities include concepts or categories (such as “Product
Categories” or “Service Types”) and events or occurrences (such as appointments or
transactions).

Entity Types in DBMS


 Strong Entity Types: These are entities that exist independently and have a
completely unique identifier.
 Weak Entity Types: These entities depend on another entity for his or her lifestyles
and do now not have a completely unique identifier on their own.
The Example of Strong and Weak Entity Types in DMBS is:

Example
 Associative Entity Types: These constitute relationships between or greater entities
and might have attributes in their own.
 Derived Entity Types: These entities are derived from different entities through a
system or calculation.
 Multi-Valued Entity Types: These entities will have more than one value for an
characteristic.

1.4.ER diagram
ER Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram that
displays the relationship of entity sets stored in a database. In other words, ER diagrams help
to explain the logical structure of databases. ER diagrams are created based on three basic
concepts: entities, attributes and relationships.

ER Diagrams contain different symbols that use rectangles to represent entities, ovals to
define attributes and diamond shapes to represent relationships.

At first look, an ER diagram looks very similar to the flowchart. However, ER Diagram
includes many specialized symbols, and its meanings make this model unique. The purpose
of ER Diagram is to represent the entity framework infrastructure.

What is ER Model?
ER Model stands for Entity Relationship Model is a high-level conceptual data model
diagram. ER model helps to systematically analyze data requirements to produce a well-
designed database. The ER Model represents real-world entities and the relationships
between them. Creating an ER Model in DBMS is considered as a best practice before
implementing your database.

ER Modeling helps you to analyze data requirements systematically to produce a well-


designed database. So, it is considered a best practice to complete ER modeling before
implementing your database.

History of ER models
ER diagrams are visual tools that are helpful to represent the ER model. Peter Chen proposed
ER Diagram in 1971 to create a uniform convention that can be used for relational databases
and networks. He aimed to use an ER model as a conceptual modeling approach.
Why use ER Diagrams?
Here, are prime reasons for using the ER Diagram

 Helps you to define terms related to entity relationship modeling


 Provide a preview of how all your tables should connect, what fields are going to be
on each table
 Helps to describe entities, attributes, relationships
 ER diagrams are translatable into relational tables which allows you to build
databases quickly
 ER diagrams can be used by database designers as a blueprint for implementing data
in specific software applications
 The database designer gains a better understanding of the information to be contained
in the database with the help of ERP diagram
 ERD Diagram allows you to communicate with the logical structure of the database to
users

Facts about ER Diagram Model


Now in this ERD Diagram Tutorial, let’s check out some interesting facts about ER
Diagram Model:

 ER model allows you to draw Database Design


 It is an easy to use graphical tool for modeling data
 Widely used in Database Design
 It is a GUI representation of the logical structure of a Database
 It helps you to identifies the entities which exist in a system and the relationships
between those entities

ER Diagrams Symbols & Notations

 Entity Relationship Diagram Symbols & Notations mainly contains three basic
symbols which are rectangle, oval and diamond to represent relationships between
elements, entities and attributes. There are some sub-elements which are based on
main elements in ERD Diagram. ER Diagram is a visual representation of data that
describes how data is related to each other using different ERD Symbols and
Notations.

1.5. Data models

What is Data Modelling?


Data modeling (data modelling) is the process of creating a data model for the data to be
stored in a database. This data model is a conceptual representation of Data objects, the
associations between different data objects, and the rules.
Data modeling helps in the visual representation of data and enforces business rules,
regulatory compliances, and government policies on the data. Data Models ensure
consistency in naming conventions, default values, semantics, security while ensuring quality
of the data.

Data Models in DBMS


The Data Model is defined as an abstract model that organizes data description, data
semantics, and consistency constraints of data. The data model emphasizes on what data is
needed and how it should be organized instead of what operations will be performed on data.
Data Model is like an architect’s building plan, which helps to build conceptual models and
set a relationship between data items.

The two types of Data Modeling Techniques are

1. Entity Relationship (E-R) Model


2. UML (Unified Modelling Language)

Why use Data Model?


The primary goal of using data model are:

 Ensures that all data objects required by the database are accurately represented.
Omission of data will lead to creation of faulty reports and produce incorrect results.
 A data model helps design the database at the conceptual, physical and logical levels.
 Data Model structure helps to define the relational tables, primary and foreign keys
and stored procedures.
 It provides a clear picture of the base data and can be used by database developers to
create a physical database.
 It is also helpful to identify missing and redundant data.
 Though the initial creation of data model is labor and time consuming, in the long run,
it makes your IT infrastructure upgrade and maintenance cheaper and faster.

1.5a. Types of Data Models in DBMS


Types of Data Models: There are mainly three different types of data models: conceptual
data models, logical data models, and physical data models, and each one has a specific
purpose. The data models are used to represent the data and how it is stored in the database
and to set the relationship between data items.

1. Conceptual Data Model: This Data Model defines WHAT the system contains. This
model is typically created by Business stakeholders and Data Architects. The purpose
is to organize, scope and define business concepts and rules.
2. Logical Data Model: Defines HOW the system should be implemented regardless of
the DBMS. This model is typically created by Data Architects and Business Analysts.
The purpose is to developed technical map of rules and data structures.
3. Physical Data Model: This Data Model describes HOW the system will be
implemented using a specific DBMS system. This model is typically created by DBA
and developers. The purpose is actual implementation of the database.

Conceptual Data Model


A Conceptual Data Model is an organized view of database concepts and their relationships.
The purpose of creating a conceptual data model is to establish entities, their attributes, and
relationships. In this data modeling level, there is hardly any detail available on the actual
database structure. Business stakeholders and data architects typically create a conceptual
data model.

The 3 basic tenants of Conceptual Data Model are

 Entity: A real-world thing


 Attribute: Characteristics or properties of an entity
 Relationship: Dependency or association between two entities

Data model example:

 Customer and Product are two entities. Customer number and name are attributes of
the Customer entity
 Product name and price are attributes of product entity
 Sale is the relationship between the customer and product
Characteristics of a conceptual data model

 Offers Organisation-wide coverage of the business concepts.


 This type of Data Models are designed and developed for a business audience.
 The conceptual model is developed independently of hardware specifications like data
storage capacity, location or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will see it in the “real world.”

Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope.

Logical Data Model


The Logical Data Model is used to define the structure of data elements and to set
relationships between them. The logical data model adds further information to the
conceptual data model elements. The advantage of using a Logical data model is to provide a
foundation to form the base for the Physical model. However, the modeling structure remains
generic.

At this Data Modeling level, no primary or secondary key is defined. At this Data modeling
level, you need to verify and adjust the connector details that were set earlier for
relationships.

Characteristics of a Logical data model

 Describes data needs for a single project but could integrate with other logical data
models based on the scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and length.
 Normalization processes to the model is applied typically till 3NF.

Physical Data Model


A Physical Data Model describes a database-specific implementation of the data model. It
offers database abstraction and helps generate the schema. This is because of the richness of
meta-data offered by a Physical Data Model. The physical data model also helps in
visualizing database structure by replicating database column keys, constraints, indexes,
triggers, and other RDBMS features.

Characteristics of a physical data model

 The physical data model describes data need for a single project or application though
it maybe integrated with other physical data models based on project scope.
 Data Model contains relationships between tables that which addresses cardinality and
nullability of the relationships.
 Developed for a specific version of a DBMS, location, data storage or technology to
be used in the project.
 Columns should have exact datatypes, lengths assigned and default values.
 Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are
defined.

Advantages and Disadvantages of Data Model


Advantages of Data model:

 The main goal of a designing data model is to make certain that data objects offered
by the functional team are represented accurately.
 The data model should be detailed enough to be used for building the physical
database.
 The information in the data model can be used for defining the relationship between
tables, primary and foreign keys, and stored procedures.
 Data Model helps business to communicate the within and across organizations.
 Data model helps to documents data mappings in ETL process
 Help to recognize correct sources of data to populate the model
Disadvantages of Data model:

 To develop Data model one should know physical data stored characteristics.
 This is a navigational system produces complex application development,
management. Thus, it requires a knowledge of the biographical truth.
 Even smaller change made in structure require modification in the entire application.
 There is no set data manipulation language in DBMS.

Conclusion

 Data modeling is the process of developing data model for the data to be stored in a
Database.
 Data Models ensure consistency in naming conventions, default values, semantics,
security while ensuring quality of the data.
 Data Model structure helps to define the relational tables, primary and foreign keys
and stored procedures.
 There are three types of conceptual, logical, and physical.
 The main aim of conceptual model is to establish the entities, their attributes, and their
relationships.
 Logical data model defines the structure of the data elements and set the relationships
between them.
 A Physical Data Model describes the database specific implementation of the data
model.
 The main goal of a designing data model is to make certain that data objects offered
by the functional team are represented accurately.
 The biggest drawback is that even smaller change made in structure require
modification in the entire application.
 Reading this Data Modeling tutorial, you will learn from the basic concepts such as
What is Data Model? Introduction to different types of Data Model, advantages,
disadvantages, and data model example.

1.6 spatial data models

Spatial data is three-dimensional, usually project it into two-dimensions for


simplicity. Because of the unique transformations that must be applied to spatial
data, it must be treated and represented differently than the non-spatial data that
describe what is happening and when.

Spatial Data Models to organize our data and link our spatial and non-spatial
data. Spatial data models store geographic data in a systematic way so that we can
effectively display, query, edit, and analyze our data within a GIS.

There are two main types of spatial data models:

a. Raster models
b. Vector models.
The raster data model represents spatial data as grid of cells, and each cell has one non-
spatial attribute associated with it.

The vector data model represents spatial data as either points, lines, or polygons that are
each linked to one or more non-spatial attributes.

These two models represent the world in fundamentally different ways. One is not
inherently better than the other, but they are better suited for different circumstances.

The choice of which model to use is often dictated by three main factors:

a. The type of phenomena we are trying to represent.


b. The scale at which we plan to analyze our data.
c. How we plan to use the data.

Raster Data Model

The raster data model represents a phenomena across space as a gridded set of cell (or
pixels). The cell size determines the Resolution of the raster image,that is the smallest
feature we can resolve with the raster. A 10 m resolution raster has cells that are 10 x 10 m
(100 m2), a 2 m resolution has cells that are 2 x 2 m (4 m2). Along with the cell size, the
number of rows and columns dictates the extent (or bounds) of a raster image. A raster
with a 1 m cell size, 5 rows, and 5 columns, will cover an area of 5 m x 5 m (25 m2).
Because of the full coverage within their bounds, raster data models are very well suited
for representing continuous phenomena where cell values correspond to measured (or
estimated) value at specific location. In GIS, rasters are commonly encountered as:
satellite and drone imagery, elevation models, climate data, model outputs, and scanned
maps.

Figure 3.19: Example of raster data. Skeeter, CC-BY-SA-4.0


Vector Data model:

The vector data model is much more well suited to represent discrete phenomena than the
raster data model. A vector feature is a representation of a discrete object as a set of x,y
coordinate pairs (points) linked to set of descriptive attribute about that object. A vector
feature’s coordinates can consist of just one (x,y) pair to form a single point feature, or
multiple points which can be connected to form lines or polygons (see Figure 3.22). The
non-spatial attribute data is typically stored in a Tabular format separate from the spatial
data, and it is linked using an index. One of the key advantages of the vector model is the
ability to store and retrieve many attributes them quickly. In GIS, vector data are
commonly encountered as: political boundaries, cenus data, pathways (road, trails, etc.),
point location (stop sign, fire hydrant), etc.

Figure 3.22: Vector objects (points, lines, or polygons) are stored along with any number
of attribute. Point, line, and polygon data are typically stored in separate files. Skeeter,
CC-BY-SA-4.0.

Points are “zero-dimensional”, they have no length or width or area. A point feature is just
an individual (x,y) coordinate pair representing a precise location, that has some linked
attribute information. Points are great for representing a variety of objects, depending on
the scale. Fire hydrants, light poles, and trees are suitable to be represented as points in
almost any application.
Figure 3.23: An example of point data showing locations of trees. The points are labeled
with their index (unique ID number) which correspoonds to the attribute table below
which stores more information about eacht tree. Skeeter, CC-BY-SA-4.0.

Lines are one-dimensional, they have length, but no width and thus no area. A line
consists of two or more points. Every line must have a start point and end point, they may
also have any number of middle points, called vertices. A vertex is just any point where
two or more lines meet. Lines are also great for representing a variety of objects,
depending on the scale. Hiking trails, flight paths, coastlines, and power lines are suitable
to be represented as lines in almost most applications. When making smaller scale maps, it
is often sufficient to represent rivers as lines, though at large scales we might elect to use a
polygon.
Polygons are two-dimensional, they have both a length and width and therefore we can
also calculate their area. All polygons consist of a set of at three or more points (vertices)
connected by line segments called “edges” that connect to form an enclosed shape. All
polygons form an enclosed shape, but some can also have “holes” (think doughnuts!),
these holes are sometimes called interior rings. Each interior ring is a separate set vertices
and edges that is wholly contained within the polygon and no two interior rings can
overlap. Polygons are useful for representing many different objects depending: political
boundaries boundaries, Köppen climate zones, lakes, continents, etc. At large scales they
can represent things like buildings which we might choose to represent as points at smaller
scales.

1.7. Raster Data Structures

Raster structure refers to the organization of spatial data in the form of a grid or matrix.
Each cell in the grid represents a specific location on the Earth’s surface and contains a value
that represents a particular attribute or characteristic at that location.

Raster or grid data structure refers to the storage of the raster data for data processing and
analysis by the computer. There are mainly three commonly used data structures such as cell-
by-cell encoding, run-length encoding, and quadtree.

Cell-By-Cell Encoding Data Structure:

This is the simplest raster data structure and is characterised by subdividing a geographic
space into grid cells. Each pixel or grid cell contains a value. A grid matrix and its cell values
for a raster are arranged into a file by row and column. Fig. 5.8 shows the cell-by-cell
encoding data structure. Digital Elevation Models (DEMs) are the best examples for this
method of data structure. In Fig. 5.8, value 1 represents the gray cells and 0 has no data. This
cell-by-cell encoding method can also be used for storage of data in satellite images. Most of
satellite images consist of multispectral bands and each pixel in a satellite image has more
than one value. Mainly three formats such as Band Sequential (BSQ), Band Interleaved by
Lines (BIL), and Band Interleaved by Pixels (BIP) are used to store data in a
multiband/multispectral imagery.
Run-Length Encoding Data Structure:

Run-Length Encoding (RLE) algorithm was developed to handle the problem that a grid
often contains redundant or missing data. When the raster data contains more missing data,
the cell-by-cell encoding method cannot be suggested. In RLE method, adjacent cells along a
row with the same value are treated as a group called a run. If a whole row has only one class,
it is stored as the class and the same attributes are kept without change. Instead of repeatedly
storing the same value for each cell, the value is stored once together with the number of the
cells that makes the run. Fig. 5.9 explains the run-length encoding structure of a polygon. In
the figure, the starting cell and the end cell of the each row denote the length of group and is
generally called as run. RLE data compression method is used in many GIS packages and in
standard image formats.

Quadtree Data Structure:


To compress the data as well as to save the space in original grid, quad tree data structure
can be used (Fig. 5.10). A quadtree works by dividing a grid into four quadrants for the
available data. The available data quadrant is again split into four half-size quadrants and so
on until the individual pixel is reached. The attribute data for all the pixels of the quadrant
remains the same even if it is divided

Raster Data Compression


We can distinguish different ways of storing raster data, which basically vary in storage size
and consequently in the geometric organisation of the storage. The following types of
geometric elements are identified:

 Lines
 Stripes
 Tiles
 Areas (e.g. Quad-trees)
 Hierarchy of resolution
Raster data are managed easily in computers; all commonly used programming languages
well support array handling. However, a raster when stored in a raw state with no
compression can be extremely inefficient in terms of computer storage space.

As already said the way of improving raster space efficiency is data compression.

Illustrations and short texts are used to describe different methods of raster data storage and
raster data compression techniques.

Full raster coding (no-compression)


By convention, raster data is normally stored row by row from the top left corner.
Example: The Swiss Digital elevation model (DHM25-Matrixmodell in decimeters)
In the header file are stored the following information:

 The dimension of the matrix (number of columns and rows)


 The x-coordinates of the South-West (lower left) corner
 The y-coordinates of the South-West (lower left) corner
 The cell size
 the code used for no data (i.e. missing) values
ncols 480 nrows 450 xllcorner 878923 yllcorner 207345 cellsize 25 nodata_value -9999 6855
6855 6855 6851 6851 6837 6824 6815 6808 6855 6857 6858 6858 6850 6839 6826 6814
6809 6854 6863 6865 6865 6849 6840 6826 6812 6803 6853 6852 6873 6886 6886 6853
6822 6804 6748 6847 6848 6886 6902 6904 6855 6808 6762 6686 6850 6859 6903 6903
6881 6806 6739 6681 6615 6845 6857 6879 6856 6795 6706 6638 6589 6539 6801 6827
6825 6769 6670 6597 6562 6522 6497 6736 6760 6735 6661 6592 6546 6517 6492 6487 ...
PROBLEM: Big amount of data!

Runlength coding (lossless)


Geographical data tends to be "spatially autocorrelated", meaning that objects which are close
to each other tend to have similar attributes:

"All things are related, but nearby things are more related than distant things"
(Tobler 1970) Because of this principle, we expect neighboring pixels to have similar values.
Therefore, instead of repeating pixel values, we can code the raster as pairs of numbers - (run
length, value).
The runlength coding is a widely used compression technique for raster data. The primary
data elements are pairs of values or tuples, consisting of a pixel value and a repetition count
which specifies the number of pixels in the run. Data are built by reading successively row by
row through the raster, creating a new tuple every time the pixel value changes or the end of
the row is reached.

Describes the interior of an area by run-lengths, instead of the boundary.


In multiple attribute case there a more options available:

We can note in Codes - III, that if a run is not required to break at the end of each line we can
compress data even further.
NOTE: run coding would be of little use for DEM data or any other type of data where
neighboring pixels almost always have different values.

Chain coding (lossless)


See Rasterising vector data and the Freeman coding

Blockwise coding (lossless)


This method is a generalization of run-length encoding to two dimensions. Instead of
sequences of 0s or 1s, square blocks are counted. For each square the position, the size and,
the contents of the pixels are stored.

Quandtree coding (lossless)


The quadtree compression technique is the most common compression method applied to
raster data. Quadtree coding stores the information by subdividing a square region into
quadrants, each of which may be further subdivided in squares until the contents of the cells
have the same values.
Reading versus
Example 1:
Positioncode of 10: 3,2

Example 2:

On the following figure we can see how an area is represented on a map and the
corresponding quadtree representation. More information on constructing and addressing
quadtrees, see Lesson "Spatial partitioning and indexing", Unit 2
Huffman coding (lossless compression)
Huffmann coding compression technique involves preliminary analysis of the frequency of
occurrence of symbols. Huffman technique creates, for each symbol, a binary data code, the
length of which is inversely related to the frequency of occurrence.

LZ77 method (lossless compression)


LZ77 compression is a loss-less compression method, meaning that the values in your raster
are not changed. Abraham Lempel and Jacob Ziv first introduced this compression method in
1977. The theory behind this compression method is relatively simple: when you find a
match (a data value that has already been seen in the input file) instead of writing the actual
value, the position and length (number of bytes) of the value is written to the output (the
length and offset - where it is and how long it is).

Some image-compression methods often referred to as LZ (Lempel Ziv) and its variants such
as LZW (Lempel Ziv Welch). With this method, a previous analysis of the data is not
required. This makes LZ77 Method applicable to all the raster data types.
JPEG-Compression (lossy compression)
The JPEG-compression process:

 The representation of the colors in the image is converted from RGB to YCbCr, consisting of
one luma component (Y), representing brightness, and two chroma components, Cb and Cr),
representing color. This step is sometimes skipped.
 The resolution of the chroma data is reduced, usually by a factor of 2. This reflects the fact
that the eye is less sensitive to fine color details than to fine brightness details.
 The image is split into blocks of 8×8 pixels, and for each block, each of the Y, Cb, and Cr
data undergoes a discrete cosine transform (DCT). A DCT is similar to a Fourier transform in
the sense that it produces a kind of spatial frequency spectrum.
 The amplitudes of the frequency components are quantized. Human vision is much more
sensitive to small variations in color or brightness over large areas than to the strength of
high-frequency brightness variations. Therefore, the magnitudes of the high-frequency
components are stored with a lower accuracy than the low-frequency components. The
quality setting of the encoder (for example 50 or 95 on a scale of 0–100 in the Independent
JPEG Group's library) affects to what extent the resolution of each frequency component is
reduced. If an excessively low quality setting is used, the high-frequency components are
discarded altogether.
 The resulting data for all 8×8 blocks is further compressed with a loss-less algorithm, a
variant of Huffman encoding.

JPEG-Compression (left to right: decreasing quality setting results in a 8x8 block generation
of pixels)
Text and graphic from: Wikipedia 2010

Example of runlength coding


In the following example, we can see the difference in data storing, between an
uncompressed raster file and a compressed one.
Full
raster coding 256 values

runlen
gth coding 132 values

1.8. Vector Data Structures


Three types of vector structures differ by the type of encoding of spatial vector files

Type of vector encoding corresponds to a level of complexity of vector files

a· Spaghetti encoding
b. Feature-encoded
c.Topologically encoded
1. Spaghetti Encoding
 Simple lines, digitized as a sequence of points with x,y coordinates
 Visual effect of a polygon, but no polygon features are stored
 No attributes (no features to link to…)
 No neighborliness (spatial relationships) among features
 Used by cartographers in early automated cartography
 No analysis possible, display only
2. Feature Encoded
 Recognizes vector features as independent objects (points, lines, and polygons).
 Feature ID, feature type, and points (IDRISI vector files).
 Features exist but are not connected, no neighborliness (lines don't join in a network,
polygons are not neighbors).
 Attributes are stored in a relational table (linked to feature Ids)
 Polygons:
o created by making the first and last point the same.
o digitized independently from each other; boundaries of neighboring polygons
are digitized and stored twice.
o Problems: doubling of info, slivers, weak analysis.
3. Topologically Encoded
 Recognizes vector features as spatially interconnected objects
 To capture relationships of connectivity and be able to use them in automated spatial
analysis (not just visually), we need to explicitly record topology of vector features.

Note: Topology (from geometry)

 Geometric properties of objects that remain constant when objects are stretched or
bended and are independent of any coordinate system and scale of measurement.
 In maps - connections between features expressed as relations of adjacency (of areas),
containment (islands), and connectivity (lines as roads)
 In vector data structures – a method of coding and storing the data such that the
connections between features are known to the GIS program.

Definitions
 Arc - uncrossed line that has a direction
 Nodes - points at which a line begins and ends
 Vertices (points) - points in an arc
 Lines consist of arcs (form a network)
 Polygons - consist of arcs + info about left and right polygon + unique ID attached to
a polygon locator.
 Attributes - in a relational attribute table
Advantages:
 No slivers – boundaries are digitized only once, each arc is stored only once.
 Allows for automatic error detection – can check for coherence of all arcs and
polygons.
 Can do analysis with neigh borliness (complex computations are needed, e.g. for
buffering).
 Allows for full vector data analysis including polygon overlay and network analysis.
 Arc/Info (editing + analysis); Cartalinx (editing), ArcView (partially).

Arc View vector structure

 Arc View vector files are called Shape files (based on shapes a.k.a. features - points,
lines, polygons).
 It is feature encoded non-topological vector format (e.g. polygon boundaries are
stored twice).
 Includes certain spatial information - spatial index (e.g. more than Idrisi vector
format).

o .shp - the file that stores the feature geometry.


o .shx - the file that stores the index of the feature geometry.
o .dbf - the dBASE file that stores the attribute information of features. When a
shapefile is added as a theme to a view, this file is displayed as a feature table.
o Other files are added that store more indexes incurred during analysis.
o All files must be stored together.

Two types of spatial analysis


 Attribute and spatial querying in Arc View GIS per se
 Topological analysis in special extensions (add-ons), e.g., Geo processing and
network analyst. Involves creating new features and changing topology.

1.9. Raster vs Vector Models-


vector data model: [data models] A representation of the world using points, lines, and
polygons. Vector models are useful for storing data that has discrete boundaries, such as
country borders, land parcels, and streets.

raster data model: [data models] A representation of the world as a surface divided into a
regular grid of cells. Raster models are useful for storing data that varies continuously, as
in an aerial photograph, a satellite image, a surface of chemical concentrations, or an
elevation surface.

Vector Data Model

A vector GIS is simply “a generic name to describe a class of GIS that use the vector data
structure to describe, represent and use spatial objects with a physical quantity that requires
both magnitude and direction for its description” Korte, 1998.

Vector data and vector graphics comprises vertices and paths with three basic symbol types
including points, lines and polygons (areas) representing vertical points.. A point is specified
by location x, y in the Cartesian coordinate system, a line by a sequence of connected points
known as vertices and the polygon, a closed area specified by a poly line having the same
starting and ending points. The selection of geometric feature used to model a geographic
entity, to a large extent depends on scale and size of the map. For example, Uttar Pradesh
may be represented as point on a small scale map like any world map but on a map of India
(large scale) Uttar Pradesh should represent it as a polygon which may have number of points
(spot heights, head quarters, etc.,) lines (Highways eg., NH-1, NH 24 & railway grid lines )
and polygons ( housing complex, lakes, industrial area ). Polygons in a spatial framework are
of two types, adjacent polygons and island polygons. The adjacent polygons can be
visualized as adjacent countries represented on a world map where the common boundaries
of two or more polygons, e.g. plot boundaries, administrative boundaries are interconnected.
In a rural space the adjacent plots with common agricultural field boundaries are adjacent
polygons. The island polygons whereas, occur in a number of situations, for example
presence of pond in a big lake or presence of pasture lands within the forests.

Raster data model:

“Raster data model is one of the variants of the field-based model of geographic data
representation, Burrough,1990”. These are made up of grid cells, generally regular spaced
with square shape. Usually defined by a set of uniform, adjacent pixels ,with each assigned a
separate pixel value. Each pixel has a value or values indicating the characteristics of the
phenomenon it represents. There are two kinds of them either discrete or continuous.

Discrete rasters have distinct values, distinct category and/or theme. For example, one grid
cell represents a particular land use or land cover class. Each thematic class can be
distinguished and specific class allotted. They have a clear beginning and end and usually
have allotted integers representing each class or an interpretation key. For example, the value
1 may represent vegetation, the value 2 represent s water body and value 3 represents urban
areas.

Figure 2 representing Digital Images -DEM,DSM and DTM


Continuous rasters are non-discrete rasters with data changing gradually as represented in
a slope map, elevation grades, soil distribution , etc., They are generally represented with
fixed registration points. Each has their unique registration points. Digital Elevation Model
(DEM) as a Matrix of equal cells representing elevation is whereas depicted in Figure 2a , the
digital surface model is represented in figure 2b and digital terrain model in figure 2c.
Phenomena however, can vary along a continuous raster from a specific source.

Figure 3: Representation of a Raster model

Raster data is also referred to as a lattice or tessellation model. A tessellation is an


infinitely repeatable pattern over space (Coxeter, 1961). “Raster data are tessellations (a
tessellation is a space filling mesh either with explicit boundaries as a mesh of polygon or
with an implicit mesh as defined by a matrix of values in the logical model) and perform a
discretization of the geometric area of interest”. A tessellation may be either regular (mesh
elements are all of the same size and shape) or irregular. Elements of a regular mesh
generally could be squares (raster), rectangles or hexagons.
1.11.Grid model in GIS:

A grid model is a raster data storage format native to Esri. There are two types of grids:
integer and floating point. Use integer grids to represent discrete data and floating-point grids
to represent continuous data. Learn more about discrete and continuous data.

Grid in ArcGIS

A grid in ArcGIS is a network of evenly spaced horizontal and vertical lines used to identify
locations on a map. They are used to show location using projected coordinates.

Why is grid important in GIS

Grids are important in GIS because they help establish coordinate systems.
Coordinate systems are sets of numbers or letters used to identify specific locations on
a map.

Grid system in a map:

A grid system on a map is usually square and is represented by drawn lines on the map
creating squares. The purpose of the grid system is to give each point on the map an identifier
or address that can be used for reference.

Grid format:

A grid format is a structured layout with rows and columns that is used to efficiently organize
and present data. It is a fundamental concept in the organization and visualization of data.
The main purpose of a grid:

The main purpose of a grid is to create clarity and consistency on a page and improve design
comprehension. It serves as a skeleton that can be used to produce different looks on a page.

Different types of grids in Arcgis:

There are five types of grids that can be added to a map frame in ArcGIS: graticule, measured
grid, MGRS grid, reference grid, and custom grid. Multiple grid types can be combined to
display different coordinate systems on the map.

Purpose of using a grid system in graphic design:

In graphic design, a grid system helps align screen elements based on sequenced columns and
rows. It provides structure for text, images, and functions, making them consistent and
recognizable throughout a design.

The two components of a grid system in geography

The two components of a grid system in geography are rows and columns. Rows
accommodate the columns and together they make up the structure that contains the content.

Place attributes are found on a geographic grid

On a geographic grid, the place attributes found are latitude and longitude. Latitude and
longitude coordinates can determine the absolute and relative location of a place on Earth.

Advantages of grid pattern

– Grids create a fresh structure.


– Easy to create and navigate.
– Helps establish hierarchy.
– Can be used to create infographics effortlessly.
– Provides structure and consistency.
– Helps balance elements on a page.
– Aids in aligning objects accurately.
– Highly flexible for different designs.

The disadvantages of grid layout

– Grids can make a design feel rigid and inflexible.


– Strict adherence to a grid can limit creativity and unique layouts.
– May result in designs that are similar and lack distinction from competitors.
The most common grid layout

The most common grid layout used by graphical and web designers is a column grid. This
involves splitting a page into vertical fields and aligning objects within these columns.
Newspapers and magazines often use column grids extensively.

The most common grid system

The most common grid system used by designers is a column grid. This involves dividing the
design area into vertical columns and aligning elements within these columns. Column grids
are widely used in newspaper and magazine layouts.

When should a grid be used

Grids provide value when used with body parts of 10cm (4″) of thickness or more. Thicker
body parts create more scatter in X-rays, and grids help remove scatter for improved image
quality, even though they increase patient dose.

use of grid in x-ray

Grids are important tools for the effective rejection of scattered X-rays and improving image
quality in X-ray systems. They help to minimize scatter and improve diagnostic accuracy,
although this comes at the cost of increased patient dose.

1.11.A TIN model in GIS:


A Triangular Irregular Network (TIN) model is a type of spatial representation used in
Geographic Information Systems (GIS). It is a digital data structure that represents a surface
using interconnected triangles. TIN models are commonly used in GIS for terrain analysis
and visualization.

The purpose of TIN model:

TIN models are typically used for high-precision modeling of smaller areas, such as in
engineering applications. They allow calculations of planimetric area, surface area, and
volume. The maximum allowable size of a TIN varies based on available memory resources.

The principle of TIN data model:

A TIN data model is a way of storing continuous surfaces. It connects known data points with
straight lines to create triangles or facets. These facets are planes that have the same slope
and aspect over the facet.

A TIN model is a vector-based alternative to the traditional raster representation of terrain


surfaces, which is known as a Digital Elevation Model (DEM). TINs are created by
interpolating between measured elevation values and allow for more flexible data
distribution. Unlike DEMs, which consist of a regular grid, TIN facets have different sizes
depending on data density.

How do I create a TIN in GIS?

To create a TIN in GIS, you can use the “Create TIN” tool. Simply type “Create TIN” in the
search box, double-click the tool to open it, specify the parameters, and click “Run” to build
the TIN surface.

TIN data model and comparisons with raster

A TIN data model is different from raster data models. TINs are vector-based representations,
while rasters consist of a grid of squares. TINs require fewer points than raster files to
represent a land surface accurately. They can be used to characterize linear features such as
coastlines and rivers more accurately.

How do you convert raster to TIN?

To convert raster data to TIN, you can use the “Raster To TIN” tool. Open the tool from the
3D Analyst toolbox, browse to the raster file you want to convert, specify the location to save
the TIN, and optionally set a z-tolerance value for the TIN.

Is TIN a raster or vector?

TIN is a vector-based representation, while raster data is represented as a grid of squares.


TINs use a network of connected triangles to represent a surface, while rasters use a grid of
cells to represent data.
Advantages of TIN data model over raster data model

TIN data models have several advantages over raster data models. TINs can be used to
generate digital surface models (DSMs) that include the height of objects on the surface,
while rasters are more suitable for creating digital elevation models (DEMs). TINs require
fewer points than rasters to represent the land surface accurately.

Why is a TIN better than a raster image for terrain representation

Compared to raster images, TINs offer several advantages for terrain representation. TINs
can accurately represent linear features such as coastlines and rivers, and they require fewer
points to represent the land surface accurately. Additionally, TINs can handle changes in
elevation more smoothly, providing a more realistic representation of terrain.

TIN dataset:

A TIN dataset is a collection of triangulated irregular network (TIN) models. It contains tools
for creating, modifying, and converting TIN datasets. TINs can be used to model surfaces
using measurements from point, line, and polygon features. They are often used for terrain
analysis and visualization in GIS.

TIN mean:

TIN stands for Triangular Irregular Network. It is a digital data structure used in Geographic
Information Systems (GIS) to represent surfaces. TINs consist of interconnected triangles,
which provide a flexible and efficient way to represent terrain and other continuous surfaces.

Raster data in GIS:

Raster data is a type of geographic data used in Geographic Information Systems (GIS). It is
represented as a grid of regularly sized pixels or cells, where each cell corresponds to a
specific physical location in the real world. Raster data is commonly used to represent
continuous phenomena such as elevation, temperature, and satellite imagery.

Vector data in GIS

Vector data is another type of geographic data used in Geographic Information Systems
(GIS). Unlike raster data, which is represented as a grid of pixels, vector data represents real-
world features using discrete points, lines, and polygons. Each feature has attribute data that
provides additional information about it, allowing for more detailed analysis and
visualization.

TIN in remote sensing


In remote sensing, TIN stands for Triangulated Irregular Network. It is a digital data structure
used to represent a surface in three-dimensional space. TINs are commonly used in remote
sensing applications for terrain modeling and analysis.

The Z tolerance of raster to TIN?

The Z tolerance is the maximum allowable difference in height (Z units) between the input
raster and the output TIN. The default Z tolerance is typically set to 1/10 of the Z range of the
input raster. The Z tolerance determines how smooth or detailed the resulting TIN surface
will be.

Import DEM into civil 3d

Yes, you can import a Digital Elevation Model (DEM) into Autodesk Civil 3D. In the
Toolspace, navigate to the surface definition collection, right-click, and select “Add DEM
File.” Then, enter the path and name of the DEM file or browse to its location.

The two types of DEM:

There are two main types of DEM: Digital Surface Models (DSM) and Digital Terrain
Models (DTM). DSMs represent the Earth’s surface, including natural and human-made
features, while DTMs represent only the bare Earth without any features or structures.

Different types of DEM models:

DEM models can be categorized into two main types: primary (measured) DEM and
secondary (computed) DEM. Primary DEMs are created from direct measurements, such as
LIDAR or survey data. Secondary DEMs are derived from primary DEMs or other elevation
sources, such as contour lines or satellite imagery.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy