0% found this document useful (0 votes)

36 views7 pages

Lecture Notes On Hash Tables: 15-122: Principles of Imperative Computation Frank Pfenning, Rob Simmons February 28, 2013

Hash tables provide an efficient way to implement associative arrays by mapping keys to values. They work by using a hash function to map keys to integer indices in an array. Collisions occur when different keys hash to the same index, and are handled using separate chaining, where a linked list of entries is stored at each array index. The average length of these chains determines the time complexity for lookups. With a reasonable load factor and good hash function, lookups take constant time on average. Randomness is important for the hash function to evenly distribute keys across indices and minimize collisions.

Uploaded by

Munavalli Matt K S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views7 pages

Lecture Notes On Hash Tables: 15-122: Principles of Imperative Computation Frank Pfenning, Rob Simmons February 28, 2013

Uploaded by

Munavalli Matt K S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lecture Notes on

Hash Tables
15-122: Principles of Imperative Computation
Frank Pfenning, Rob Simmons
Lecture 13
February 28, 2013

Introduction

In this lecture we re-introduce the dictionaries that were implemented as a

part of Clac and generalize them as so-called associative arrays. Associative
arrays are data structures that are similar to arrays but are not indexed by
integers, but other forms of data such as strings. One popular data structures for the implementation of associative arrays are hash tables. To analyze
the asymptotic efficiency of hash tables we have to explore a new point of
view, that of average case complexity. Another computational thinking concept that we revisit is randomness. In order for hash tables to work efficiently in practice we need hash functions whose behavior is predictable
(deterministic) but has some aspects of randomness.
Relating to our learning goals, we have
Computational Thinking: We consider the importance of randomness in algorithms, and also discuss average case analysis, which is how we can
argue that hash tables have acceptable performance.
Algorithms and Data Structures: We describe a linear congruential generator, which is a certain kind of pseudorandom number generator. We also
discuss hashtables and their implementation with separate chaining
(an array of linked lists).
Programming: We review the implementation of the rand library in C0.

L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.2

Associative Arrays

Arrays can be seen as a mapping, associating with every integer in a given

interval some data item. It is finitary, because its domain, and therefore
also its range, is finite. There are many situations when we want to index
elements differently than just by integers. Common examples are strings
(for dictionaries, phone books, menus, data base records), or structs (for
dates, or names together with other identifying information). They are so
common that they are primitive in some languages such as PHP, Python,
or Perl and perhaps account for some of the popularity of these languages.
In many applications, associative arrays are implemented as hash tables
because of their performance characteristics. We will develop them incrementally to understand the motivation underlying their design.

Keys and values

In many applications requiring associative arrays, we are storing complex

data values and want to access them by a key which is derived from the
data. A typical example of keys are strings, which are appropriate for many
scenarios. For example, the key might be a student id and the data entry
might be a collection of grades, perhaps another associative array where the
key is the name of assignment or exam and the data is a score. We make
the assumption that keys are unique in the sense that in an associative array
there is at most one data item associated with a given key.
We can think of built-in C0 arrays as having a set number of keys: a
C0 array of length 3 has three keys 0, 1, and 2. Our implementation of
unbounded arrays allowed us to add a specific new key, 3, to an array; we
want to be able to add new keys to the array. We want our associatve arrays
to allow us to have more interesting keys (like strings, or non-sequential
integers) while keeping the property that there is a unqiue value for each
valid key.

Chains

A first idea to explore is to implement the associative array as a linked

list, called a chain. If we have a key k and look for it in the chain, we just
traverse it, compute the intrinsic key for each data entry, and compare it
with k. If they are equal, we have found our entry, if not we continue the
search. If we reach the end of the chain and do not find an entry with key k,
L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.3

then no entry with the given key exists. If we keep the chain unsorted this
gives us O(n) worst case complexity for finding a key in a chain of length
n, assuming that computing and comparing keys is constant time.
Given what we have seen so far in our search data structures, this seems
very poor behavior, but if we know our data collections will always be
small, it may in fact be reasonable on occasion.
Can we do better? One idea goes back to binary search. If keys are ordered we may be able to arrange the elements in an array or in the form of
a tree and then cut the search space roughly in half every time we make a
comparison. We will begin thinking about this approch just before Spring
Break, and it will occupy us for a few lectures after the break as well. Designing such data structures is a rich and interesting subject, but the best
we can hope for with this approach is O(log(n)), where n is the number of
entries. We have seen that this function grows very slowly, so this is quite
a practical approach.
Nevertheless, the challenge arises if we can do better than O(log(n)),
say, constant time O(1) to find an entry with a given key. We know that
it can done be for arrays, indexed by integers, which allow constant-time
access. Can we also do it, for example, for strings?

Hashing

The first idea behind hash tables is to exploit the efficiency of arrays. So:
to map a key to an entry, we first map a key to an integer and then use the
integer to index an array A. The first map is called a hash function. We write
it as hash( ). Given a key k, our access could then simply be A[hash(k)].
There is an immediate problem with this approach: there are 231 positive integers, so we would need a huge array, negating any possible performance advantages. But even if we were willing to allocate such a huge
array, there are many more strings than ints so there cannot be any hash
function that always gives us different ints for different strings.
The solution is to allocate an array of smaller size, say m, and then look
up the result of the hash function modulo m, for example, A[hash(k)%m].
This creates a new problem: it is inevitable that multiple strings will map
to the same array index. For example, if the array has size m then if we
have more then m elements, at least two must map to the same index. In
practice, this will happen much sooner that this.
If two hash functions map a key to the same integer value (modulo m),
we say we have a collision. In general, we would like to avoid collisions,
L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.4

because some additional operations will be required to deal with them,

slowing down operations and taking more space. We analyze the cost of
collisions more below.

Separate Chaining

How do we deal with collisions of hash values? The simplest is a technique

called separate chaining. Assume we have hash(k1 )%m = i = hash(k2 )%m,
where k1 and k2 are the distinct keys for two data entries e1 and e2 we want
to store in the table. In this case we just arrange e1 and e2 into a chain
(implemented as a linked list) and store this list in A[i].
In general, each element A[i] in the array will either be NULL or a chain of
entries. All of these must have the same hash value for their key (modulo
m), namely i. As an exercise, you might consider other data structures
here instead of chains and weigh their merits: how about sorted lists? Or
queues? Or doubly-linked lists? Or another hash table?
We stick with chains because they are simple and fast, provided the
chains dont become too long. This technique is called separate chaining
because the chains are stored seperately, not directly in the array. Another
technique, which we do not discuss, is linear probing where we continue by
searching (linearly) for an unused spot array itself, starting from the place
where the hash function put us.
Under separate chaining, a snapshot of a hash table might look something like this picture.

L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.5

Average Case Analysis

How long do we expect the chains to be on average? For a total number

n of entries in a table of size m, it is n/m. This important number is also
called the load factor of the hash table. How long does it take to search for
an entry with key k? We follow these steps:
1. Compute i = hash(k)%m. This will be O(1) (constant time), assuming
it takes constant time to compute the hash function.
2. Go to A[i], which again is constant time O(1).
3. Search the chain starting at A[i] for an element whose key matches k.
We will analyze this next.
The complexity of the last step depends on the length of the chain. In the
worst case it could be O(n), because all n elements could be stored in one
chain. This worst case could arise if we allocated a very small array (say,
m = 1), or because the hash function maps all input strings to the same
table index i, or just out of sheer bad luck.
Ideally, all the chains would be approximately the same length, namely
n/m. Then for a fixed load factor such as n/m = = 2 we would take on
the average 2 steps to go down the chain and find k. In general, as long
as we dont let the load factor become too large, the average time should be
O(1).
If the load factor does become too large, we could dynamically adapt its
size, like in an unbounded array. As for unbounded arrays, it is beneficial
to double the size of the hash table when the load factor becomes too high,
or possibly halve it if the size becomes too small. Analyzing these factors
is a task for amortized analysis, just as for unbounded arrays.

Randomness

The average case analysis relies on the fact that the hash values of the key
are relatively evenly distributed. This can be restated as saying that the
probability that each key maps to an array index i should be about the
same, namely 1/m. In order to avoid systematically creating collisions,
small changes in the input string should result in unpredicable change in
the output hash value that is uniformly distributed over the range of C0 integers. We can achieve this with a pseudorandom number generator (PRNG).

L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.6

A pseudorandom number generator is just a function that takes one number and obtains another in a way that is both unpredictable and easy to
calculuate. The C0 rand library is a pseudorandom numer generator with
a fairly simple interface:
/* library file rand.h0 */
typedef struct rand* rand_t;
rand_t init_rand (int seed);
int rand(rand_t gen);
One can generate a random number generator (type rand_t) by initializing
it with an arbitrary seed. Then we can generate a sequence of random
numbers by repeatedly calling rand on such a generator.
The rand library in C0 is implemented as a linear congruential generator. A linear congruential generator takes a number x and finds the next
number by calculating (a x) + c modulo m. In C0, its easiest to say that
m is just 232 , since addition and multiplication in C0 are already defined
modulo 232 . The trick is finding a good multiplier a and summand c.
If we were using 4-bit numbers (from 8 to 7 where multiplication and
addition are modulo 16) then we could set a to 5 and c to 7 and our pseudorandom number generator would generate the following series of numbers:
0 7 (6) (7) 4 (5) (2)
3 (8) (1) 1 (4) 3 6 5 0 . . .
The PRNG used in C0s library sets a to 1664525 and c to 1013904223
and generates the following series of numbers starting from 0:
0 1013904223 1196435762 (775096599) (1426500812) . . .
This kind of generator is fine for random testing or (indeed) the basis for
a hashing function, but the results are too predictable to use it for cryptographic purposes such as encrypting a message. In particular, a linear
congruential generator will sometimes have repeating patterns in the lower
bits. If one wants numbers from a small range it is better to use the higher
bits of the generated results rather than just applying the modulus operation.
It is important to realize that these numbers just look random, they arent
really random. In particular, we can reproduce the exact same sequence if
we give it the exact same seed. This property is important for both testing purposes and for hashing. If we discover a bug during testing with
L ECTURE N OTES

F EBRUARY 28, 2013

Hash Tables

L13.7

pseudorandom numbers, we want to be able to reliably reproduce it, and

whenever we hash the same key using pseudorandom numbers, we need
to be sure we will eventually get the same result.
/* library file rand.c0 */
struct rand {
int seed;
};
rand_t init_rand (int seed) {
rand_t gen = alloc(struct rand);
gen->seed = seed;
return gen;
}
int rand(rand_t gen) {
gen->seed = gen->seed * 1664525 + 1013904223;
return gen->seed;
}
We will discuss using random number generators to hash strings in Lecture 14.

Exercises
Exercise 1 What happens when you replace the data structure for separate chaining by something other than a linked list? Discuss the changes and identify benefits and disadvantages when using a sorted list, a queue, a doubly-linked list, or
another hash table for separate chaining.
Exercise 2 Consider the situation of writing a hash function for strings of length
two, that only use the characters A to Z. There are 676 different such strings.
You were hoping to get away with implementing a hash table without collisions,
since you are only using 79 out of those 676 two-letter words. But you still see
collisions most of the time. Explain this phenomenon with the birthday problem.

L ECTURE N OTES

F EBRUARY 28, 2013

DSL Writeup
No ratings yet
DSL Writeup
64 pages
Lecture05 Hash Table
No ratings yet
Lecture05 Hash Table
65 pages
DS Module 5 Hashing
No ratings yet
DS Module 5 Hashing
23 pages
DSA2 Chapter 5 Hashing
No ratings yet
DSA2 Chapter 5 Hashing
44 pages
Algo Lec3
No ratings yet
Algo Lec3
53 pages
Hashing
No ratings yet
Hashing
37 pages
Chapter10 HashTables
No ratings yet
Chapter10 HashTables
49 pages
06 - Hashing
No ratings yet
06 - Hashing
75 pages
Lec12 Hash Tables 09092024 090609pm
No ratings yet
Lec12 Hash Tables 09092024 090609pm
48 pages
Hashing and Indexing
No ratings yet
Hashing and Indexing
28 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
77 pages
Lec 11 Hashing and Collision
No ratings yet
Lec 11 Hashing and Collision
16 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
Hashing Part1
No ratings yet
Hashing Part1
73 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Ds 5 Update
No ratings yet
Ds 5 Update
26 pages
Lecture 8 Hashing
No ratings yet
Lecture 8 Hashing
47 pages
Chapter 5 - Hashing - Part1
No ratings yet
Chapter 5 - Hashing - Part1
28 pages
Lecture 13 - Hash Tables
No ratings yet
Lecture 13 - Hash Tables
51 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
Unit 3 Hashing
No ratings yet
Unit 3 Hashing
23 pages
Lecture03 Hashing
No ratings yet
Lecture03 Hashing
12 pages
Hash Tables
No ratings yet
Hash Tables
35 pages
8 Hashtables
No ratings yet
8 Hashtables
84 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Lecture 12
No ratings yet
Lecture 12
33 pages
Dsa Merged
No ratings yet
Dsa Merged
339 pages
Hash Tables
No ratings yet
Hash Tables
45 pages
CH 4
No ratings yet
CH 4
58 pages
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
No ratings yet
Data Structures and Algorithms: CS245-2010S-13 Hash Tables
41 pages
Hashing
No ratings yet
Hashing
11 pages
22CS302 LM21
No ratings yet
22CS302 LM21
7 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
Hashing RPK
No ratings yet
Hashing RPK
61 pages
CS 03
No ratings yet
CS 03
22 pages
L-2005-08-Advance Data Structure Part 1-HS
No ratings yet
L-2005-08-Advance Data Structure Part 1-HS
46 pages
Hashing
No ratings yet
Hashing
9 pages
Hashing
No ratings yet
Hashing
44 pages
Hash Tables
No ratings yet
Hash Tables
30 pages
Dsa 4
No ratings yet
Dsa 4
55 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
43 pages
Hashing and Hash Tables
No ratings yet
Hashing and Hash Tables
23 pages
Pgcil-Latest Testing Format
80% (5)
Pgcil-Latest Testing Format
149 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Dsa Lecture 13 Hash Tables
No ratings yet
Dsa Lecture 13 Hash Tables
15 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Hash Table
No ratings yet
Hash Table
24 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
Hashing: Presented by
No ratings yet
Hashing: Presented by
35 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Tes Ebro 000050 en
No ratings yet
Tes Ebro 000050 en
122 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
06 Hashing
No ratings yet
06 Hashing
6 pages
Design and Analysis of Algorithms (DAA) Notes
No ratings yet
Design and Analysis of Algorithms (DAA) Notes
112 pages
12 Hashing
No ratings yet
12 Hashing
9 pages
2-3 - HBH - FF & Plumbing Specifications
No ratings yet
2-3 - HBH - FF & Plumbing Specifications
170 pages
Ip Notes
100% (5)
Ip Notes
74 pages
Hashing
No ratings yet
Hashing
13 pages
Flexible Elements For Gas - 3356uk - 2 - 01 - 13 - 3 - Download
No ratings yet
Flexible Elements For Gas - 3356uk - 2 - 01 - 13 - 3 - Download
18 pages
ISO 14253-1 Decision Rules
No ratings yet
ISO 14253-1 Decision Rules
9 pages
Wai Phyo Aung - DNS
0% (1)
Wai Phyo Aung - DNS
112 pages
Internet of Things of Iot
100% (1)
Internet of Things of Iot
8 pages
IEEE STD 1312-1993
No ratings yet
IEEE STD 1312-1993
6 pages
Harmony Control Relays - RM22TR33
No ratings yet
Harmony Control Relays - RM22TR33
8 pages
Beijer Electroincs Manual
No ratings yet
Beijer Electroincs Manual
32 pages
C2-Moi Truong Lap Trinh
No ratings yet
C2-Moi Truong Lap Trinh
18 pages
C++ Programming: From Problem Analysis To Program Design: Chapter 3: Input/Output
No ratings yet
C++ Programming: From Problem Analysis To Program Design: Chapter 3: Input/Output
50 pages
PCR PCRH Full lit-SP PDF
100% (1)
PCR PCRH Full lit-SP PDF
18 pages
User Manual: 2-Bay Gigabit Network Storage
No ratings yet
User Manual: 2-Bay Gigabit Network Storage
92 pages
Sys Admin Guide Xerox WorkCentre 7775
No ratings yet
Sys Admin Guide Xerox WorkCentre 7775
185 pages
Ada Lab Manual: Design and Analysis of Algorithms Laboratory
No ratings yet
Ada Lab Manual: Design and Analysis of Algorithms Laboratory
48 pages
Highway Capacity Estimation
50% (2)
Highway Capacity Estimation
7 pages
Introduction To Unix Operating System: Munavalli Matt K S Bihe-Bca
No ratings yet
Introduction To Unix Operating System: Munavalli Matt K S Bihe-Bca
60 pages
Unix Commands
No ratings yet
Unix Commands
46 pages
Transparent Bridging
No ratings yet
Transparent Bridging
62 pages
CPL Manual@Azdocuments - in
No ratings yet
CPL Manual@Azdocuments - in
58 pages
Mod 0
No ratings yet
Mod 0
51 pages
Cpps Mod2@Azdocuments - in
No ratings yet
Cpps Mod2@Azdocuments - in
41 pages
E Commerce Unit-II: Types of Electronic Payment Systems
No ratings yet
E Commerce Unit-II: Types of Electronic Payment Systems
18 pages
E Commerce: Department of Information Technology
No ratings yet
E Commerce: Department of Information Technology
29 pages
E-Commerce U5 PDF
No ratings yet
E-Commerce U5 PDF
21 pages
Consumer Oriented Applications-1
100% (2)
Consumer Oriented Applications-1
24 pages
Specs
No ratings yet
Specs
12 pages
E-Commerce U4 PDF
No ratings yet
E-Commerce U4 PDF
12 pages
E Commerce Unit-III: Global Suppliers
No ratings yet
E Commerce Unit-III: Global Suppliers
11 pages
Water Softeners Residential 9100SXT Valve Softeners Canadian ENGLISH Manual
No ratings yet
Water Softeners Residential 9100SXT Valve Softeners Canadian ENGLISH Manual
16 pages
Pluto Safety PLC
No ratings yet
Pluto Safety PLC
42 pages
DxDiag Sgsdagsagfoh (Kfdposapgjasdogaoh9uerhgdsugadsgoppfjgofdsopdso Dihdsog DGG
No ratings yet
DxDiag Sgsdagsagfoh (Kfdposapgjasdogaoh9uerhgdsugadsgoppfjgofdsopdso Dihdsog DGG
24 pages
Create A Database With phpMyAdmin
No ratings yet
Create A Database With phpMyAdmin
6 pages
Chapter 1 - Essential PHP Spring Into PHP 5 by Steven Holzner
No ratings yet
Chapter 1 - Essential PHP Spring Into PHP 5 by Steven Holzner
16 pages
28-09-2020 e Ecommerce
No ratings yet
28-09-2020 e Ecommerce
4 pages
INB Reference Number IRK2587239 16-Sep-2019 (03:31 PM IST) Debit Transaction Status Processed
No ratings yet
INB Reference Number IRK2587239 16-Sep-2019 (03:31 PM IST) Debit Transaction Status Processed
1 page
TH 3 6 1 PDF
No ratings yet
TH 3 6 1 PDF
12 pages
LO550
No ratings yet
LO550
40 pages
E-Commerce 0
No ratings yet
E-Commerce 0
25 pages
Python Variable Types
No ratings yet
Python Variable Types
6 pages
Shell Risella X 430: Specifications, Approvals & Recommendations
No ratings yet
Shell Risella X 430: Specifications, Approvals & Recommendations
2 pages
Cpps Mod5@Azdocuments - in
No ratings yet
Cpps Mod5@Azdocuments - in
23 pages
Polyethylene Geomembranes
No ratings yet
Polyethylene Geomembranes
18 pages
Product Data Sheet: Transformer Control 500va 240/480V-120V
No ratings yet
Product Data Sheet: Transformer Control 500va 240/480V-120V
2 pages
How To Get Data From Google AdWords Using SSIS ZappySys Blog
No ratings yet
How To Get Data From Google AdWords Using SSIS ZappySys Blog
9 pages
Application Areas and MAIO Planning Recommendations
No ratings yet
Application Areas and MAIO Planning Recommendations
2 pages
Big Data Analytics
50% (2)
Big Data Analytics
1 page
1SDA050959R1
No ratings yet
1SDA050959R1
2 pages
HZA1-6 M6: Generating Rates
No ratings yet
HZA1-6 M6: Generating Rates
9 pages
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture Notes On Hash Tables: 15-122: Principles of Imperative Computation Frank Pfenning, Rob Simmons February 28, 2013

Uploaded by

Lecture Notes On Hash Tables: 15-122: Principles of Imperative Computation Frank Pfenning, Rob Simmons February 28, 2013

Uploaded by

Lecture Notes on

In this lecture we re-introduce the dictionaries that were implemented as a

F EBRUARY 28, 2013

Arrays can be seen as a mapping, associating with every integer in a given

Keys and values

In many applications requiring associative arrays, we are storing complex

A first idea to explore is to implement the associative array as a linked

F EBRUARY 28, 2013

F EBRUARY 28, 2013

because some additional operations will be required to deal with them,

How do we deal with collisions of hash values? The simplest is a technique

F EBRUARY 28, 2013

Average Case Analysis

How long do we expect the chains to be on average? For a total number

F EBRUARY 28, 2013

F EBRUARY 28, 2013

pseudorandom numbers, we want to be able to reliably reproduce it, and

F EBRUARY 28, 2013

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.