Cryptographic Hash Functions
Cryptographic Hash Functions
Cryptographic Hash Functions
Functions
What are Hash functions?
• Mathematical functions: A mapping of items (values) in the domain
to items (values) in the range.
• Hash functions are special mathematical functions that satisfy the
following three properties:
• Inputs (or items in the domain) can be any size (not-fixed); technically size of
input is not unbounded in practice
• Outputs (or items in the range) are fixed-size (a hash function such as SHA-
256 that has an output size of 256 bits)
• Efficiently computable, i.e., the mapping should be efficiently (in polynomial
time) computable
• A hash function is any function that can be used to map data of arbitrary size to fixed-size
values. The values returned by a hash function are called hash values, hash codes, digests,
or simply hashes. The values are usually used to index a fixed-size table called a hash table.
Use of a hash function to index a hash table is called hashing or scatter storage addressing.
• Hash functions and their associated hash tables are used in data storage and retrieval
applications
A hash function takes an input as a key, which is associated with a datum or
record and used to identify it to the data storage and retrieval application.
The keys may be fixed length, like an integer, or variable length, like a name.
In some cases, the key is the datum itself. The output is a hash code used to
index a hash table holding the data or records, or pointers to them.
A hash function may be considered to perform three functions:
•Convert variable length keys into fixed length (usually machine word length
or less) values, by folding them by words or other units using a parity-
preserving operator like ADD or XOR.
•Scramble the bits of the key so that the resulting values are uniformly
distributed over the key space.
•Map the key values into ones less than or equal to the size of the table
A good hash function satisfies two basic properties: 1) it should be very fast
to compute; 2) it should minimize duplication of output values (collisions).
Data Integrity and Source Authentication
• Can’t find any two different messages with the same message digest
• Collision resistance implies second preimage resistance
• Collisions, if we could find them, would give signatories a way to repudiate their signatures
Property 1: Collision Resistance
• How to find a collision in a Secure Hash function with a 256 bit output?
• Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you
find a collision.
• How long does this take?
• Worst-case - 2256 + 1 inputs
• On average – more than 50% chance of finding collision after 2128 inputs (Birthday paradox)
• More than 99.8% chance of collision after 2130 randomly chosen inputs
• Brute-force always works, no matter what H is, in finding collision. However it takes too long to
matter (2128 is a lot of tries!)
• Strategy 2: Find cryptographic or other weaknesses in hash functions
• Is the following function cryptographically secure H(x) = x mod 23 ?
• Most cryptographically secure has functions also have weaknesses. E.g., MD5 was considered to
be secure, until after many years of research collisions were found. SHA 256 (currently used
secure hash function) has no known attacks, but we don’t know it is secure!
No Hash function has proven to be collision-free or secure!
• How to find a collision?
• Usually, collision happens after sqrt(N), where N is total number of possible
ways
• For ex: For 256 bits output, N=2256
• try 2130 randomly chosen inputs
• 99.8% chance that two of them will collide
• This works no matter what H is … but it takes too long to matter
• How big is 2256?
• 2256 is about 1077
• 60 million hashes per second, and the expected number of tries needed to
find a solution is 2255. The result is 2255 / (60 × 106) s ≈ 1068 s ≈ 3 × 1061 years
• Even if we had 1 trillion computers and ran them concurrently, it would take
about 3 × 1049 years
• Is there a faster way to find collisions?
• For some possible Hashes, yes.
• For others, we don’t know of one.
• It is infeasible to find the input having same hash but not
impossible
• No Hash Function has been proven collision-free.
What is the Birthday Paradox?
• Assuming all days of the year have the same likelihood of having a birthday, the
chances of another person sharing your birthday is 1/365 which is a 0.27%.
• So, if you gather up 20-30 people in one room, the odds of two people sharing
the exact same birthday rises up astronomically.
• In fact, there is a 50-50 chance for 2 people of sharing the same birthday in this
scenario!
• Simple rule in probability:
• Suppose you have N different possibilities of an even happening, then you need square
root of N random items for them to have a 50% chance of a collision.
• So applying this theory for birthdays, you have 365 different possibilities of birthdays, so
you just need Sqrt(365), which is ~23, randomly chosen people for 50% chance of two
people sharing birthdays.
Pre-image resistance
• Means that it should be computationally hard to reverse a hash
function.
• if a hash function h produced a hash value z, then it should be a
difficult process to find any input value x that hashes to z.
• This property protects against an attacker who only has a hash value and
is trying to find the input.
Property 2: Hiding or Preimage resistant
• This measures how difficult to devise a message which hashes to the known digest
• Roughly speaking, the hash function must be one-way.
• Hiding or pre-image resistance: Given H(x), it is infeasible to find x
Message size < 264 < 264 < 264 < 2128 < 2128
Block size 512 512 512 1024 1024
Word size 32 32 32 64 64
# of steps 80 64 64 80 80
Secure Hash Algorithm
• SHA-256 is used in several different parts of the Bitcoin network:
• Mining uses SHA-256 as the proof-of-work algorithm.
• SHA-256 is used in the creation of bitcoin addresses to improve security and
privacy.
Construction of Hash functions
• Hash functions are typically constructed from fixed-input compression functions!
• Example: See construction of SHA-256 Hash function → SHA-256 used in Bitcoins
• Also referred to as Merkle-Damgard Transform
• Why does it work?
• Theorem: If c is collision-free, then SHA-256 is collision-free.
Padding (10* | length)
512 bits
Message Message Message
(block 1) (block 2) (block n)
c c c
IV Hash
SHA 256…
• it takes the message you're hashing, and it breaks it up into blocks that
are 512 bits in size. The message size, in general, isn't necessarily a
multiple of block size. To make it a multiple of block size, we will use
some kind of padding (i.e. a 1 followed by a certain number of 0)
• you start with the 256-bit value called the IV, specified in the standards
document and the first block. This 768-bits string goes through a special
function c (compression function) that outputs a 256-bits string
• Then the compression function (Merkle‐Damgard transform) is
applied to the concatenation of the first output and the second block
• The process is repeated until the end of the blocks, the hash is the final
256-bits output
One Compression function in SHA-256
• One compression function in SHA-
256 comprises
• a 256-bit block cipher with 64
rounds,
• a key expansion mechanism
from 512 to 2048 bits, and
• a final set of eight 32-bit
additions.
Application of SHA-256 in bitcoin