Lecture 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

SRI KRISHNA ARTS AND SCIENCE COLLEGE

DEPARTMENT OF COMPUTER SCIENCE

Blockchain and Cryptocurrency

Course Code 22CSU25,22SSU25

Class III B.Sc CS A,B & SS

UNIT 1 Lecture – 01

TOPIC Cryptographic hash functions

Facilitator J.Christy Andrews,


Assistant Professor,
Computer Science
CRYPTOGRAPHIC HASH FUNCTIONS

The first cryptographic primitive that we need to


understand is a cryptographic hash function. A hash
function is a mathematical function with the following
three properties:

• Its input can be any string of any size.


• It produces a fixed-sized output. For the purpose of
making the discussion in this chapter concrete, we will
assume a 256-bit output size. However, our discussion
holds true for any output size, as long as it is sufficiently
large.
• It is efficiently computable. Intuitively this means that
for a given input string, you can figure out what the
output of the hash function is in a reasonable amount of
time. More technically, computing the hash of an n-bit
string should have a running time that is O(n).
These properties define a general hash function, one
that could be used to build a data structure, such as a
hash table. We’re going to focus exclusively on
cryptographic hash functions.

For a hash function to be cryptographically secure, we


require that it has the following three additional
properties: (1) collision resistance, (2) hiding, and (3)
puzzle friendliness.
Property 1: Collision Resistance

The first property that we need from a


cryptographic hash function is that it is collision
resistant.

A collision occurs when two distinct inputs


produce the same output.

A hash function H(·) is collision resistant if


nobody can find a collision
Collision resistance. A hash function H is said to
be collision resistant if it is infeasible to find two
values, x and y, such that x ≠ y, yet H(x) = H(y).

Notice that we said “nobody can find” a collision,


but we did not say that no collisions exist.
Actually, collisions exist for any hash function,
and we can prove this by a simple counting
argument.
The input space to the hash function contains all
strings of all lengths, yet the output space
contains only strings of a specific fixed length.
Because the input space is larger than the
output space (indeed, the input space is infinite,
while the output space is finite), there must be
input strings that map to the same output string .
A hash collision. x and y are distinct values, yet
when input into hash function H, they produce
the same output.
Consider the following simple method for finding
a collision for a hash function with a 256-bit
output size:

pick 2256 + 1 distinct values, compute the


hashes of each of them, and check whether any
two outputs are equal. Since we picked more
inputs than possible outputs, some pair of them
must collide when you apply the hash function.
The method above is guaranteed to find a
collision. But if we pick random inputs and
compute the hash values, we’ll find a collision
with high probability long before examining 2256
+ 1 inputs.

In fact, if we randomly choose just 2130 + 1


inputs, it turns out there’s a 99.8 percent chance
that at least two of them are going to collide.
That we can find a collision by examining only
roughly the square root of the number of
possible outputs results from a phenomenon in
probability known as the birthday paradox.
Inevitability of collisions. Because the number of
inputs exceeds the number of outputs, we are
guaranteed that there must be at least one
output to which the hash function maps more
than one input
This collision-detection algorithm works for
every hash function. But, of course, the problem
is that it takes a very long time to do. For a hash
function with a 256-bit output, you would have
to compute the hash function 2256 + 1 times in
the worst case, and about 2128 times on
average.
Found a general but impractical algorithm to
find a collision for any hash function. A more
difficult question is: Is there some other method
that could be used on a particular hash function
to find a collision? In other words, although the
generic collision detection algorithm is not
feasible to use, there may be some other
algorithm that can efficiently find a collision for a
specific hash function.
Consider, for example, the following hash function:
H(x)= x mod 2256

This function meets our requirements of a hash


function as it accepts inputs of any length,
returns a fixed-sized output (256 bits), and is
efficiently computable. But this function also has
an efficient method for finding a collision. Notice
that this function just returns the last 256 bits of
the input.
Property 2: Hiding

The second property that we want from our


hash functions is that it is hiding.

The hiding property asserts that if we’re given


the output of the hash function y = H(x), there’s
no feasible way to figure out what the input, x,
was. The problem is that this property can’t be
true in the form stated.
The problem is that this property can’t be true in
the form stated. Consider the following simple
example:
we’re going to do an experiment where we flip a
coin. If the result of the coin flip was heads,
we’re going to announce the hash of the string
“heads.” If the result was tails, we’re going to
announce the hash of the string “tails.”
Consider the following simple example: we’re
going to do an experiment where we flip a coin. If
the result of the coin flip was heads, we’re going
to announce the hash of the string “heads.” If the
result was tails, we’re going to announce the
hash of the string “tails.”
We then ask someone, an adversary, who didn’t
see the coin flip, but only saw this hash output,
to figure out what the string was that was hashed
(we’ll soon see why we might want to play
games like this).
In response,they would simply compute both the
hash of the string “heads” and the hash of the
string “tails,” and they could see which one they
were given. And so,in just a couple steps, they
can figure out what the input was.
The adversary was able to guess what the string
was because only two values of x were possible,
and it was easy for the adversary to just try both
of them.
To be able to achieve the hiding property, there
must be no value of x that is particularly likely.
That is, x has to be chosen from a set that is, in
some sense, very spread out. If x is chosen from
such a set, this method of trying a few values of x
that are especially likely will not work.
Hiding. A hash function H is said to be hiding if
when a secret value r is chosen from a
probability distribution that has high min-
entropy, then, given H(r ‖ x), it is infeasible to
find x.
In information theory, min-entropy is a measure
of how predictable an outcome is, and high min-
entropy captures the intuitive idea that the
distribution (i.e., of a random variable) is very
spread out. What that means specifically is that
when we sample from the distribution, there’s no
particular value that’s likely to occur.
So, for a concrete example, if r is chosen
uniformly from among all strings that are 256
bits long, then any particular string is chosen
with probability 1/2 256 , which is an
infinitesimally small value.
APPLICATION: COMMITMENTS

Now let’s look at an application of the hiding


property. In particular, what we want to do is
something called a commitment.

A commitment is the digital analog of taking a


value, sealing it in an envelope, and putting that
envelope out on the table where everyone can
see it.
When you do that, you’ve committed yourself
to what’s inside the envelope. But you haven’t
opened it, so even though you’ve committed
to a value, the value remains a secret from
everyone else. Later, you can open the
envelope and reveal the value that you
committed to earlier.
Commitment scheme. A commitment scheme
consists of two algorithms:

com := commit(msg, nonce) The commit function


takes a message and secret random value,called
a nonce, as input and returns a commitment.

verify(com, msg, nonce) The verify function takes


a commitment, nonce, and message as input. It
returns true if com == commit(msg, nonce) and
false otherwise.
We require that the following two security
properties hold:

• Hiding: Given com, it is infeasible to find msg.

• Binding: It is infeasible to find two pairs (msg,


nonce) and (msg′, nonce′) such that msg ≠ msg′
and commit(msg, nonce) == commit(msg′,
nonce′).
To use a commitment scheme, we first need to
generate a random nonce.

We then apply the commit function to this nonce


together with msg, the value being committed to,
and we publish the commitment com.
This stage is analogous to putting the sealed
envelope on the table. At a later point, if we want
to reveal the value that we committed to earlier,
we publish the random nonce that we used to
create this commitment, and the message,msg.

Now anybody can verify that msg was indeed the


message committed to earlier. This stage is
analogous to opening the envelope.
Cryptographic hash function. Consider the
following commitment scheme:

commit(msg, nonce) := H(nonce ‖ msg),


where nonce is a random 256-bit value
To commit to a message, we generate a random
256-bit nonce. Then we concatenate the nonce
and the message and return the hash of this
concatenated value as the commitment. To
verify, someone will compute this same hash of
the nonce they were given concatenated with
the message. And they will check whether the
result is equal to the commitment that they saw.
Take another look at the two properties required
of our commitment schemes. If we substitute the
instantiation of commit and verify as well as
H(nonce ‖ msg) for com, then these properties
become:
• Hiding: Given H(nonce ‖ msg), it is infeasible to
find msg.
• Binding: It is infeasible to find two pairs (msg,
nonce) and (msg′, nonce′) such that msg ≠ msg′
and H(nonce ‖ msg) == (nonce′ ‖ msg′).
Property 3: Puzzle Friendliness

The third security property we’re going to need


from hash functions is that they are puzzle
friendly. This property is a bit complicated. We
first explain what the technical requirements of
this property are and then give an application
that illustrates why this property is useful.
Puzzle friendliness. A hash function H is said to
be puzzle friendly if for every possible n-bit
output value y, if k is chosen from a distribution
with high min-entropy, then it is infeasible to
find x such that H(k ‖ x) = y in time significantly
less than 2n.
if someone wants to target the hash function
to have some particular output value y, and if
part of the input has been chosen in a suitably
randomized way, then it’s very difficult to find
another value that hits exactly that target.
APPLICATION: SEARCH PUZZLE

we’re going to build a search puzzle, a


mathematical problem that requires searching
a very large space to find the solution. In
particular, a search puzzle has no shortcuts.
That is, there’s no way to find a valid solution
other than searching that large space.
Search puzzle. A search puzzle consists of

• a hash function, H,
• a value, id (which we call the puzzle-ID),
chosen from a high minentropy distribution, and
• a target set Y.
A solution to this puzzle is a value, x, such that
The intuition is this: if H has an n-bit output, then
it can take any of 2n values. Solving the puzzle
requires finding an input such that the output
falls within the set Y, which is typically much
smaller than the set of all outputs.
The size of Y determines how hard the puzzle is.
If Y is the set of all n-bit strings, then the puzzle
is trivial, whereas if Y has only one element,
then the puzzle is maximally hard
That the puzzle ID has high min-entropy
ensures that there are no shortcuts. On the
contrary, if a particular value of the ID were
likely, then someone could cheat, say, by
precomputing a solution to the puzzle with that
ID.
The Merkle-Damgård transform is quite simple.
Suppose that the compression function takes
inputs of length m and produces an output of a
smaller length n.
The input to the hash function, which can be of
any size, is divided into blocks of length m – n.
The construction works as follows: pass each
block together with the output of the previous
block into the compression function.
Notice that input length will then be (m – n) + n =
m, which is the input length to the compression
function. For the first block, to which there is no
previous block output, we instead use an
initialization vector (IV in Figure 1.3)
This number is reused for every call to the
hash function, and in practice you can just look it
up in a standards document. The last block’s
output is the result that you return.
SHA-256 hash function (simplified). SHA-256
uses the Merkle-Damgård transform to turn a
fixed-length collision-resistant compression
function into a hash function that accepts
arbitrary-length inputs. The input is padded, so
that its length is a multiple of 512 bits. IV stands
for initialization vector.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy