A Comparison of R, R+,R, X and Hilberg Tree: Submitted by
A Comparison of R, R+,R, X and Hilberg Tree: Submitted by
A Comparison of R, R+,R, X and Hilberg Tree: Submitted by
Submitted By:
Ram Charan Baishya
CSI100022
R-trees:
R-trees are tree data structures that are similar to B-trees, but are used for
spatial access methods, i.e., for indexing multi-dimensional information; for
example, the (X, Y) coordinates of geographical data. A common real-world
usage for an R-tree might be: "Find all museums within 2 km of my current
location".
The data structure splits space with hierarchically nested, and possibly
overlapping, minimum bounding rectangles (MBRs, otherwise known as
bounding boxes, i.e. "rectangle", what the "R" in R-tree stands for).
Each node of an R-tree has a variable number of entries (up to some pre-
defined maximum). Each entry within a non-leaf node stores two pieces of
data: a way of identifying a child node, and the bounding box of all entries
within this child node.
The insertion and deletion algorithms use the bounding boxes from the
nodes to ensure that "nearby" elements are placed in the same leaf node (in
particular, a new element will go into the leaf node that requires the least
enlargement in its bounding box). Each entry within a leaf node stores two
pieces of information; a way of identifying the actual data element (which,
alternatively, may be placed directly in the node), and the bounding box of
the data element.
Different algorithms can be used to split nodes when they become too full,
resulting in the quadratic and linear R-tree sub-types.
Search:
The input is a search rectangle (Query box). Searching is quite similar to
searching in a B+tree. The search starts from the root node of the tree.
Every internal node contains a set of rectangles and pointers to the
corresponding child node and every leaf node contains the rectangles of
spatial objects (the pointer to some spatial object can be there). For every
rectangle in a node, it has to be decided if it overlaps the search rectangle or
not. If yes, the corresponding child node has to be searched also. Searching
is done like this in a recursive manner until all overlapping nodes have been
traversed. When a leaf node is reached, the contained bounding boxes
(rectangles) are tested against the search rectangle and their objects (if
there are any) are put into the result set if they lie within the search
rectangle.
Insertion:
To insert an object, the tree is traversed recursively from the root node. All
rectangles in the current internal node are examined. The constraint of least
coverage is employed to insert an object, i.e., the box that needs least
enlargement to enclose the new object is selected. In the case where there
is more than one rectangle that meets this criterion, the one with the
smallest area is chosen. Inserting continues recursively in the chosen node.
Once a leaf node is reached, a straightforward insertion is made if the leaf
node is not full. If the leaf node is full, it must be split before the insertion is
made. A few splitting algorithms have been proposed for good R-tree
performance.
Figure 1:A 2d view of R tree
R*-trees:
R*-trees are a variant of R-trees used for indexing spatial information. R*-
trees support point and spatial data at the same time with a slightly higher
cost than other R-trees. It was proposed by Norbert Beckmann, Hans-Peter
Kriegel, Ralf Schneider, and Bernhard Seeger in 1990.
When a node overflows, a portion of its entries are removed from the node
and reinserted into the tree. (In order to avoid an indefinite cascade of
reinsertions caused by subsequent node overflow, the reinsertion routine
may be called only once in each level of the tree when inserting any one new
entry.) This has the effect of producing more well-clustered groups of entries
in nodes, reducing node coverage. Furthermore, actual node splits are often
postponed, causing average node occupancy to rise. Re-insertion can be
seen as a method of incremental tree optimization triggered on node
overflow.
Algorithm:
The R*-tree uses the same algorithm as the R-tree for query and delete
operations. The primary difference is the insert algorithm, specifically how it
chooses which branch to insert the new node into and the methodology for
splitting a node that is full.
R+ tree:
An R+ tree is a method for looking up data using a location, often (x, y)
coordinates, and often for locations on the surface of the earth. Searching on
one number is a solved problem; searching on two or more, and asking for
locations that are nearby in both x and y directions, requires craftier
algorithms.
Advantages:
Because nodes are not overlapped with each other, point query performance
benefits since all spatial regions are covered by at most one node.
A single path is followed and fewer nodes are visited than with the R-tree
Disadvantages:
X-tree
In computer science, an X-tree is an index tree structure based on the R-
tree used for storing data in many dimensions. It differs from R-trees, R+-
trees and R*-trees because it emphasizes prevention of overlap in the
bounding boxes, which increasingly becomes a problem in high dimensions.
In cases where nodes cannot be split without preventing overlap, the node
split will be deferred, resulting in super-nodes. In extreme cases, the tree
will linearize, which defends against worst-case behaviors observed in some
other data structures.
Hilbert R-tree
There are two types of Hilbert R-tree, one for static database and one for
dynamic databases. In both cases, space filling curves and specifically the
Hilbert curve are used to achieve better ordering of multidimensional objects
in the node. This ordering has to be ‘good’, in the sense that it should group
‘similar’ data rectangles together, to minimize the area and perimeter of the
resulting minimum bounding rectangles (MBRs). Packed Hilbert R-trees are
suitable for static databases in which updates are very rare or in which there
are no updates at all.
Refference :
www.wikipedia.com