B+ Tree Indexing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

What is Indexing?

Indexing is a data structure technique which allows you to


quickly retrieve records from a database file. An Index is a small
table having only two columns. The first column comprises a
copy of the primary or candidate key of a table. Its second
column contains a set of pointers for holding the address of the
disk block where that specific key value stored.
An index –
• Takes a search key as input
• Efficiently returns a collection of matching records.
B-Tree Index
B-tree index is the widely used data structures for tree based
indexing in DBMS. It is a multilevel format of tree based
indexing in DBMS technique which has balanced binary search
trees. All leaf nodes of the B tree signify actual data pointers.
Moreover, all leaf nodes are interlinked with a link list, which
allows a B tree to support both random and sequential access.

• Lead nodes must have between 2 and 4 values.


• Every path from the root to leaf are mostly on an equal
length.
• Non-leaf nodes apart from the root node have between 3
and 5 children nodes.
• Every node which is not a root or a leaf has between n/2]
and n children.
B+Tree vs. B-Tree: What’s the Difference?
B+ tree is another data structure that’s used to store data and
looks almost the same as the B-tree. The only difference is that
B+ tree stores data on the leaf nodes. This means that all non-
leaf node values are duplicated in leaf nodes again. Below is a
sample B+tree.

An
illustration of a B+tree. | Image: Dhanushka Madushan
The 13, 30, 9, 11, 16 and 38 non-leaf values are again repeated
in leaf nodes.
Leaf nodes include all values and all of the records are in sorted
order. B+tree allows you to do the same search as B-tree, but it
also allows you to travel through all the values in a leaf node if
we put a pointer to each leaf node as follows.
Illustration
of a B+tree with leaf node referencing. | Image: Dhanushka
Madushan

Structure of B+ Trees

B+ Trees
B+ Trees contain two types of nodes:
• Internal Nodes: Internal Nodes are the nodes that are
present in at least n/2 record pointers, but not in the root
node,
• Leaf Nodes: Leaf Nodes are the nodes that have n pointers.
The Structure of the Internal Nodes of a B+ Tree of Order
‘a’ is as Follows
1. Each internal node is of the form: <P1, K1, P2, K2, …..,
Pc-1, Kc-1, Pc> where c <= a and each Pi is a tree pointer
(i.e points to another node of the tree) and, each Ki is a
key-value (see diagram-I for reference).
2. Every internal node has : K1 < K2 < …. < Kc-1
3. For each search field value ‘X’ in the sub-tree pointed at by
Pi, the following condition holds: Ki-1 < X <= Ki, for 1 < I
< c and, Ki-1 < X, for i = c (See diagram I for reference)
4. Each internal node has at most ‘aa tree pointers.
5. The root node has, at least two tree pointers, while the other
internal nodes have at least \ceil(a/2) tree pointers each.
6. If an internal node has ‘c’ pointers, c <= a, then it has ‘c –
1’ key values.

Structure of Internal Node


The Structure of the Leaf Nodes of a B+ Tree of Order ‘b’ is
as Follows
1. Each leaf node is of the form: <<K1, D1>, <K2, D2>, …..,
<Kc-1, Dc-1>, Pnext> where c <= b and each Di is a data
pointer (i.e points to actual record in the disk whose key
value is Ki or to a disk file block containing that
record) and, each Ki is a key value and, Pnext points to
next leaf node in the B+ tree (see diagram II for
reference).
2. Every leaf node has : K1 < K2 < …. < Kc-1, c <= b
3. Each leaf node has at least \ceil(b/2) values.
4. All leaf nodes are at the same level.

Structure of Lead Node


Diagram-II Using the Pnext pointer it is viable to traverse all
the leaf nodes, just like a linked list, thereby achieving ordered
access to the records stored in the disk.
Tree Pointer
Searching a Record in B+ Trees

Searching Record in B+ Trees


Let us suppose we have to find 58 in the B+ Tree. We will start
by fetching from the root node then we will move to the leaf
node, which might contain a record of 58. In the image given
above, we will get 58 between 50 and 70. Therefore, we will we
are getting a leaf node in the third leaf node and get 58 there. If
we are unable to find that node, we will return that ‘record not
founded’ message.
Insertion in B+ Trees
Insertion in B+ Trees is done via the following steps.
• Every element in the tree has to be inserted into a leaf node.
Therefore, it is necessary to go to a proper leaf node.
• Insert the key into the leaf node in increasing order if there
is no overflow.
For more, refer to Insertion in a B+ Trees.
Deletion in B+Trees
Deletion in B+ Trees is just not deletion but it is a combined
process of Searching, Deletion, and Balancing. In the last step of
the Deletion Process, it is mandatory to balance the B+ Trees,
otherwise, it fails in the property of B+ Trees.
For more, refer to Deletion in B+ Trees.
Advantages of B+Trees
• A B+ tree with ‘l’ levels can store more entries in its
internal nodes compared to a B-tree having the same ‘l’
levels. This accentuates the significant improvement made
to the search time for any given key. Having lesser levels
and the presence of Pnext pointers imply that the B+ trees
is very quick and efficient in accessing records from disks.
• Data stored in a B+ tree can be accessed both sequentially
and directly.
• It takes an equal number of disk accesses to fetch records.
• B+trees have redundant search keys, and storing search
keys repeatedly is not possible.
Disadvantages of B+Trees
• The major drawback of B-tree is the difficulty of traversing
the keys sequentially. The B+ tree retains the rapid random
access property of the B-tree while also allowing rapid
sequential access.
Application of B+ Trees
• Multilevel Indexing
• Faster operations on the tree (insertion, deletion, search)
• Database indexing
FAQs on B+ Trees
1. What is a B+ Tree?
Answer:
B+ Tree is balanced binary search tree that can simply be a B
Tree, where data is stored in keys, not key-value pairs.
2. What is the advantage of the B+ Tree?
Answer:
The height of the B+ Trees is mostly balanced and is
comparatively lesser than B-Trees.
3. Where are B+ Trees used?
Answer:
B+ Trees are often used for disk-based storage systems.
What is a B+ Tree?
A B+ Tree is primarily utilized for implementing dynamic
indexing on multiple levels. Compared to B- Tree, the B+ Tree
stores the data pointers only at the leaf nodes of the Tree, which
makes search more process more accurate and faster.

Rules for B+ Tree


Here are essential rules for B+ Tree.
• Leaves are used to store data records.
• It stored in the internal nodes of the Tree.
• If a target key value is less than the internal node, then the
point just to its left side is followed.
• If a target key value is greater than or equal to the internal
node, then the point just to its right side is followed.
• The root has a minimum of two children.
Why use B+ Tree
Here, are reasons for using B+ Tree:
• Key are primarily utilized to aid the search by directing to
the proper Leaf.
• B+ Tree uses a “fill factor” to manage the increase and
decrease in a tree.
• In B+ trees, numerous keys can easily be placed on the
page of memory because they do not have the data
associated with the interior nodes. Therefore, it will quickly
access tree data that is on the leaf node.
• A comprehensive full scan of all the elements is a tree that
needs just one linear pass because all the leaf nodes of a B+
tree are linked with each other.
B+ Tree vs. B Tree
Here, are the main differences between B+ Tree vs. B Tree
B+ Tree B Tree
Search keys can be repeated. Search keys cannot be redundant.

Both leaf nodes and internal nodes


Data is only saved on the leaf nodes.
can store data

Data stored on the leaf node makes Searching is slow due to data stored
the search more accurate and faster. on Leaf and internal nodes.
Deletion is not difficult as an Deletion of elements is a
element is only removed from a leaf complicated and time-consuming
node. process.
Linked leaf nodes make the search
You cannot link leaf nodes.
efficient and quick.
Search Operation
In B+ Tree, a search is one of the easiest procedures to execute
and get fast and accurate results from it.
The following search algorithm is applicable:
• To find the required record, you need to execute the binary
search on the available records in the Tree.
• In case of an exact match with the search key, the
corresponding record is returned to the user.
• In case the exact key is not located by the search in the
parent, current, or leaf node, then a “not found message” is
displayed to the user.
• The search process can be re-run for better and more
accurate results.
Search Operation Algorithm
1. Call the binary search method on the records in the B+ Tree.
2. If the search parameters match the exact key
The accurate result is returned and displayed to the user
Else, if the node being searched is the current and the
exact key is not found by the algorithm
Display the statement "Recordset cannot be found."
Output:
The matched record set against the exact key is displayed to the
user; otherwise, a failed attempt is shown to the user.
Insert Operation
The following algorithm is applicable for the insert operation:
• 50 percent of the elements in the nodes are moved to a new
leaf for storage.
• The parent of the new Leaf is linked accurately with the
minimum key value and a new location in the Tree.
• Split the parent node into more locations in case it gets
fully utilized.
• Now, for better results, the center key is associated with the
top-level node of that Leaf.
• Until the top-level node is not found, keep on iterating the
process explained in the above steps.
Insert Operation Algorithm
1. Even inserting at-least 1 entry into the leaf container does
not make it full then add the record
2. Else, divide the node into more locations to fit more
records.
a. Assign a new leaf and transfer 50 percent of the node
elements to a new placement in the tree
b. The minimum key of the binary tree leaf and its new key
address are associated with the top-level node.
c. Divide the top-level node if it gets full of keys and
addresses.
i. Similarly, insert a key in the center of the top-level
node in the hierarchy of the Tree.
d. Continue to execute the above steps until a top-level node
is found that does not need to be divided anymore.
3) Build a new top-level root node of 1 Key and 2 indicators.
Output:
The algorithm will determine the element and successfully insert
it in the required leaf node.

The above B+ Tree sample example is explained in the steps


below:
• Firstly, we have 3 nodes, and the first 3 elements, which are
1, 4, and 6, are added on appropriate locations in the nodes.
• The next value in the series of data is 12 that needs to be
made part of the Tree.
• To achieve this, divide the node, add 6 as a pointer element.
• Now, a right-hierarchy of a tree is created, and remaining
data values are adjusted accordingly by keeping in mind the
applicable rules of equal to or greater than values against
the key-value nodes on the right.
Delete Operation
The complexity of the delete procedure in the B+ Tree surpasses
that of the insert and search functionality.
The following algorithm is applicable while deleting an element
from the B+ Tree:
• Firstly, we need to locate a leaf entry in the Tree that is
holding the key and pointer. , delete the leaf entry from the
Tree if the Leaf fulfills the exact conditions of record
deletion.
• In case the leaf node only meets the satisfactory factor of
being half full, then the operation is completed; otherwise,
the Leaf node has minimum entries and cannot be deleted.
• The other linked nodes on the right and left can vacate any
entries then move them to the Leaf. If these criteria is not
fulfilled, then they should combine the leaf node and its
linked node in the tree hierarchy.
• Upon merging of leaf node with its neighbors on the right
or left, entries of values in the leaf node or linked neighbor
pointing to the top-level node are deleted.
The example above illustrates the procedure to remove an
element from the B+ Tree of a specific order.
• Firstly, the exact locations of the element to be deleted are
identified in the Tree.
• Here the element to be deleted can only be accurately
identified at the leaf level and not at the index placement.
Hence, the element can be deleted without affecting the
rules of deletion, which is the value of the bare-minimum
key.
• In the above example, we have to delete 31 from the Tree.
• We need to locate the instances of 31 in Index and Leaf.
• We can see that 31 is available in both Index and Leaf node
level. Hence, we delete it from both instances.
• But, we have to fill the index pointing to 42. We will now
look at the right child under 25 and take the minimum
value and place it as an index. So, 42 being the only value
present, it will become the index.
Delete Operation Algorithm
1) Start at the root and go up to leaf node containing the key K
2) Find the node n on the path from the root to the leaf node
containing K
A. If n is root, remove K
a. if root has more than one key, done
b. if root has only K
i) if any of its child nodes can lend a node
Borrow key from the child and adjust child links
ii) Otherwise merge the children nodes. It will be a new
root
c. If n is an internal node, remove K
i) If n has at least ceil(m/2) keys, done!
ii) If n has less than ceil(m/2) keys,
If a sibling can lend a key,
Borrow key from the sibling and adjust keys in n and
the parent node
Adjust child links
Else
Merge n with its sibling
Adjust child links
d. If n is a leaf node, remove K
i) If n has at least ceil(M/2) elements, done!
In case the smallest key is deleted, push up the next
key
ii) If n has less than ceil(m/2) elements
If the sibling can lend a key
Borrow key from a sibling and adjust keys in n and its
parent node
Else
Merge n and its sibling
Adjust keys in the parent node
Output:
The Key “K” is deleted, and keys are borrowed from siblings for
adjusting values in n and its parent nodes if needed.
Summary:
• B+ Tree is a self-balancing data structure for executing
accurate and faster searching, inserting and deleting
procedures on data
• We can easily retrieve complete data or partial data because
going through the linked tree structure makes it efficient.
• The B+ tree structure grows and shrinks with an
increase/decrease in the number of stored records.
• Storage of data on the leaf nodes and subsequent branching
of internal nodes evidently shortens the tree height, which
reduces the disk input and output operations, ultimately
consuming much less space on the storage devices.
In the context of B+ tree index, "copy up" and "push up" are
operations that can occur during the insertion or deletion of a
key in the B+ tree.

1. Copy Up: When a key is inserted into a leaf node and causes
the node to become full, the key is copied up to the parent node.
This process is called "copying up" because the key is
propagated up the tree to maintain the B+ tree's structure and
balance.

2. Push Up: When a key is inserted into a non-leaf node and


causes the node to become full, the node needs to split. In this
case, a "push up" operation occurs where the median key from
the full node is pushed up to the parent node. This helps to
maintain the B+ tree's properties and ensures that the tree
remains balanced.

Both "copy up" and "push up" operations are important for
maintaining the integrity and structure of a B+ tree index, which
is commonly used in database systems for efficient data
retrieval.

Advantages of Indexing
Important pros/ advantage of Indexing are:
• It helps you to reduce the total number of I/O operations
needed to retrieve that data, so you don’t need to access a
row in the database from an index structure.
• Offers Faster search and retrieval of data to users.
• Indexing also helps you to reduce tablespace as you don’t
need to link to a row in a table, as there is no need to store
the ROWID in the Index. Thus you will able to reduce the
tablespace.
• You can’t sort data in the lead nodes as the value of the
primary key classifies it.
Disadvantages of Indexing
Important drawbacks/cons of Indexing are:
• To perform the indexing database management system, you
need a primary key on the table with a unique value.
• You can’t perform any other indexes in Database on the
Indexed data.
• You are not allowed to partition an index-organized table.
• SQL Indexing Decrease performance in INSERT,
DELETE, and UPDATE query.
Summary:
• Indexing is a small table which is consist of two columns.
• Two main types of indexing methods are 1)Primary
Indexing 2) Secondary Indexing.
• Primary Index is an ordered file which is fixed length size
with two fields.
• The primary Indexing is also further divided into two types
1)Dense Index 2)Sparse Index.
• In a dense index, a record is created for every search key
valued in the database.
• A sparse indexing method helps you to resolve the issues of
dense Indexing.
• The secondary Index in DBMS is an indexing method
whose search key specifies an order different from the
sequential order of the file.
• Clustering index is defined as an order data file.
• Multilevel Indexing is created when a primary index does
not fit in memory.
• The biggest benefit of Indexing is that it helps you to
reduce the total number of I/O operations needed to retrieve
that data.
• The biggest drawback to performing the indexing database
management system, you need a primary key on the table
with a unique value.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy