0% found this document useful (0 votes)
11 views

Base64 Usecases

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Base64 Usecases

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Global Information Assurance Certification Paper

Copyright SANS Institute


Author Retains Full Rights
This paper is taken from the GIAC directory of certified professionals. Reposting is not permited without express written permission.

Interested in learning more?


Check out the list of upcoming events offering
"Network Monitoring and Threat Detection In-Depth (Security 503)"
at http://www.giac.org/registration/gcia
 

Base64 Can Get You Pwned

s.
GIAC (GCIA) Gold Certification

t
gh
Author:  Kevin  Fiscus,  kevinfiscus@gmail.com  

i
Advisor:  David  Shinberg  

lr
ul
f
Accepted:  April  13th  2011  

ns
ai
et
Abstract  

rr
Base64  is  an  encoding  scheme  originally  designed  to  allow  binary  data  to  be  
represented  as  ASCII  text.    Widespread  in  its  use,  base64  seems  to  provide  a  level  of  
ho
security  by  making  sensitive  information  difficult  to  decipher.    In  reality,  the  use  of  
t
base64  provides  a  significant  advantage  to  attackers  while  providing  minimal  
Au

benefit  to  defenders.    The  use  of  base64  can  result  in  the  disclosure  of  passwords,  
bypass  of  data  leakage  protection  systems  and  can  even  be  used  to  create  a  one  
e,

click,  obfuscated  and  self-­‐contained  cross  site  scripting  attacks.    Because  of  these  
ut

risks,  detecting  base64  usage  on  a  network  should  be  an  important  part  of  any  
comprehensive  security  program.    Unfortunately,  there  is  a  problem;  base64  is  
t
i

almost  impossible  to  detect  accurately  using  traditional  methods.    This  paper  
t
ns

provides  an  overview  of  the  base64  problem,  and  more  importantly,  outlines  a  
methodology  that  can  be  used  to  promote  base64  detection  using  the  Snort  
I

intrusion  detection  system.


NS
SA
11
20
©

 
©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 2
 

1. Introduction
Helix Pharmaceuticals is worried about security. In the cutthroat world of multi-

s.
billion dollar pharmaceutical companies, industrial espionage is a significant concern. In

t
gh
addition, political and social activists continually attempt to disrupt business as

i
lr
retribution for perceived injustices. As a result, Helix takes information security

ul
extremely seriously. Their security program consists of numerous protective and

f
detective controls including the use of extremely strong passwords, data leakage

ns
protection (DLP) solutions, network intrusion detection systems (NIDS), web filtering

ai
and email security solutions. The controls in place were deemed, by the Chief Security

et
Officer, to be adequate until they discovered that their strong passwords were

rr
compromised, their DLP and IDS were evaded and their web security controls were
ho
bypassed. After a thorough investigation, it was determined that one simple technology
t
was the cause of it all – base64. This story is fictional but the concepts are real and
Au

deserve the attention of every information security department.


e,

Base64 is a commonly used encoding scheme originally designed as a way to


ut

represent binary data in an ASCII text format. Like almost every aspect of computer
t
i
t

technology today, base64 if not used properly, can result is increased security risk. As
ns

mentioned in the story about Helix, attackers can also use it as a method to obfuscate
I

and/or execute their attacks, evade detection and to bypass otherwise strong controls. To
NS

mitigate the risks associated with use of base64, it is important to understand what base64
SA

is, how it is used, how it is abused and how to detect its use in modern computing
environments.
11
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 3
 

2. Base64 Overview
2.1. Encoding vs. Encryption

s.
When it comes to obscuring data, there are really three different approaches

t
gh
commonly discussed: steganography, encryption and encoding. Steganography, or

i
lr
“stego”, is a process by which data is hidden from observers. Herodotus documented one

ul
of the earliest examples around 440 BC. He tells the story of Histiaeus who shaved the

f
head of his most trusted slave and tattooed a message in it. Once the slave’s hair had

ns
grown back, the message was hidden. (Perera, 2011) When the messenger got to their

ai
final destination, their head would be shaved thereby disclosing the message. In today’s

et
modern age of computing, a similar effect is achieved by changing the least significant

rr
bits of each byte of an image file, for example. In pure steganography, the data is not
ho
changed in any way, but is simply hidden.
t
Au
The following two pictures look similar. The one on the left is the original. The
one on the right has had data injected into it using a program called iSteg. To the naked
e,

eye, there are few, if any, visible differences between these pictures, however if the
ut

second picture were fed into the iSteg program, the original text would be revealed.
t
i
t

Original Picture Stego’d Picture


ns
I
NS
SA
11

Original Text Un-Stego’d Text


20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 4
 
Encryption is an entirely different method of obfuscation but rather than hiding
the fact that a message exists, like stego, encryption attempts to hide the meaning of the
message. One of the simplest forms of encryption is a rotational cipher where the letters

s.
of the alphabet are shifted. A rotation of 3 or ROT-3 would result in two alphabets, the

t
true alphabet, in which the original message is written and the shifted alphabet. The

igh
following shows a typical ROT-3 scheme.

lr
ul
True: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Shifted: CDEFGHIJKLMNOPQRSTUVWXYZAB

f
ns
Using this ROT-3 scheme, the letter C would be used in place of A so the word

ai
CAR would be encrypted as ECT and the word HOUSE would be encrypted as

et
KQWUG. This is, of course, a very basic encryption scheme. Modern cryptographic

rr
schemes use sophisticated combinations of substitution and transposition against blocks
ho
or streams of data to come up with ciphertext that is difficult, if not impossible to convert
t
back to the original plaintext without the proper key. Encryption is an effective way to
Au

protect the confidentiality of data.


e,

The following table shows the same text encrypted using the same encryption
ut

scheme but using different keys. Encrypting data results in a binary, rather than a text
t
i
t

file thus the binary results have been encoded using base64 to make the readable.
ns
I

Clear Text Algorithm Key Base64 Cipher Text


Test OBfxMpyn7oY=
NS

Test1 5Rcw8GZ+/QM=
Hello! DES
TestTest q2a0ZkvgMeM=
SA

test uy8XtiCOto0=

As you can see, other than the trailing equal sign (a result of base64 padding),
11

different keys used to encrypt the same source text using the same algorithm result in
20

vastly different encrypted or cipher texts. Decrypting the cipher text without the key is
©

ranges from difficult to virtually impossible depending on the strength of the encryption
algorithm.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 5
 
Encoding may seem like encryption in that data gets changed from one form to
another and the encoded text does not look like the original. Encoding, however, does
not use substitution and transposition based on a secret key. Rather, encoding is the

s.
process of displaying data in another format. In the world of computers, the most

t
common form of display suitable for humans to read is the American Standard Code of

igh
Information Interchange or ASCII. ASCII includes the letters and numbers we read

lr
every day plus some control characters such as backspace and tab. Thus all of the letters,

ul
spaces and punctuation written in this document so far are representations of ASCII text.

f
ns
In the world of computers however, ASCII is not the only way of encoding or

ai
representing data. In its most basic form, a single ASCII character is stored on the

et
computer as a single byte of data that can also be represented as binary, octal, decimal or

rr
hexadecimal. The following table shows the various encodings of some common ASCII
ho
characters:
t
Au
Glyph Hex Dec Oct Binary
A 0x41 65 101 100 0001
e,

a 0x61 97 141 110 0001


! 0x21 33 041 010 0001
ut

Backspace 0x08 8 010 000 1000


t
i
t

Based on this, a simple word like Cat can be represented as follows:


ns
I

• ASCII: Cat
Hexadecimal: 0x43 61 74
NS


• Decimal: 67 97 116
• Octal: 103 141 164
SA

• Binary: 01000011 01100001 01110100


11

All of these encodings spell Cat and as long as a recipient knows enough to
20

decode the message, they can. The fact that the message may be encoded provides no
assurance of confidentiality other than relying on the fact that any given attacker may not
©

be able to determine the method of encoding. Unfortunately, as you can see from the
example above, many types of encoding often used in the computing industry are fairly
easy to identify.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 6
 
Like ASCII, hex, octal and binary, base64 is an encoding scheme. Specifically,
base64 was designed as a means to represent binary data as ASCII text using a
numbering system consisting of 64 digits. This may seem difficult to understand, but it is

s.
fairly simple. We typically interact with numbering systems with 10 digits; 0 through 9.

t
This is a base10 system. Binary, a base2 numbering system, has 2 digits; 0 and 1.

igh
Hexadecimal is a base16 system using 0 through 9 plus a, b, c, d, e and f for its digits.

lr
Base64 typically uses 0 through 9, a through z and A through Z for the first 62 digits of

ul
the system. Different variations of base64 use different characters for the final 2 digits.

f
ns
Just as ASCII and binary can be used to represent data, so can base64. The

ai
palindrome “Was it a car or a cat I saw” would be represented as

et
“V2FzIGl0IGEgY2FyIG9yIGEgY2F0IEkgc2F3”. As you can see, the source phrase

rr
reads the same forwards as it does backwards but this is not the case in the encoded text.
ho
While this may seem “secure” the fact that you can simply paste this text into an online
t
base64 decoder and recover the original text illustrates the weaknesses of base64 as a
Au

security mechanism.
e,

2.2. Common Use


tut

Base64 is used virtually everywhere. The following are some common


i
t

applications that make use of base64.


ns
I

• Basic authentication to web sites. When this type of authentication is used,


NS

the username and the password are separated by a colon, concatenated and the
results encoded using base64. (Franks, 1999)
SA

• Transfer of binary data via mediums such an email, as a replacement for


11

uuencode. (Freed, 1996)


20

• Evasion of basic anti-spamming tools. (Craig, 2007)


• Encoding characters strings in LDAP LDIF or files (Good, 2000)
©

• Embedding binary data in an XML file


• Encoding binary files, such as images, within scripts or HTML to avoid
depending on external files. (Coyier, 2010)
• Communicating encrypted cookie information (Prabhakar, 2011)

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 7
 
Of these uses, only a few should be considered both legitimate and appropriate.
Using basic web authentication, for example, should be avoided as it risks disclosing the
username and password to an attacker. Malicious use of base64 to evade anti-spam

s.
technologies is obviously not recommended. The remaining use cases are but should be

t
considered suspect for a variety of reasons that will be discussed in detail throughout this

igh
document.

lr
ul
2.3. Identification and Decoding

f
The characteristics that make up a base64 encoded string are fairly simple; it will

ns
typically contain letters (A-Z and a-z), numbers (0-9) and the characters “/”, “+” and “=”

ai
where the equal sign, if found, will always be found at the end of the string. Base64

et
strings usually contain a multiple of 4 characters (e.g. 4, 8, 12, 16, etc.). In such cases,

rr
the minimum size for a base64-encoded string is 4 characters. If the source string is not
ho
long enough to generate an output of 4 characters, one or two equal signs will be added
t
for padding. This padding is found in most base64 encoded strings where the encoding
Au

does not generate a number of characters that is divisible by 4, thus you often see either
e,

one or two equal signs at the end of base64 encoded data. Based on this definition
ut

however, the words “data”, “Data” and “Database” are all potentially valid base64
t
i
t

(although they decode to random binary data) making positive validation of base64 data
ns

difficult. Making things worse, base64 does not always use the special characters / and +.
I

In some implementations of base64 a number of other special characters are used


NS

including the dash (-), the underscore (_), the period (.), the colon (:), and the exclamation
SA

point (!). In addition, some implementations of base64 don’t use padding. As a result,
base64 can contain any combination of letters (upper and lower case), numbers and
11

various special characters (/+-_:!) that may or may not have one or two equal signs at the
20

end. Needless to say, detecting base64 in your organization can be difficult.

 
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 8
 
There are a number of methods to determine whether a specific set of data is a
valid base64 encoded string, but determining whether it was actually the result of base64-
encoding is virtually impossible by any means other than trying to decode it. Fortunately,

s.
in many cases, detecting base64 encoding is not really desirable as such encoding has

t
numerous legitimate uses. What we are often concerned about is the use of base64 to

igh
“secure” authentication credentials and that can be detected using, for example, Snort as

lr
seen in the following Emerging Threats rule:

ul
f
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"ET POLICY Outgoing

ns
Basic Auth Base64 HTTP Password detected unencrypted";
flow:established,to_server; content:"|0d 0a|Authorization|3a 20|Basic"; nocase;

ai
content:!"YW5vbnltb3VzOg=="; within:32; classtype:policy-violation;

et
reference:url,doc.emergingthreats.net/bin/view/Main/2006380;
reference:url,www.emergingthreats.net/cgi-

rr
bin/cvsweb.cgi/sigs/POLICY/POLICY_Basic_HTTP_Auth; sid:2006380;
horev:10;)
t
This rule is fairly straightforward, particularly when you remove the messages, ID
Au

numbers, references and revision information as follows:


e,

alert tcp $HOME_NET any -> any $HTTP_PORTS (flow:established,to_server;


ut

content:"|0d 0a|Authorization|3a 20|Basic"; nocase;


t

content:!"YW5vbnltb3VzOg=="; within:32;)
i
t
ns

This rule is looking at TCP traffic on $HTTP_PORTS (a variable used to define


I

the ports on which web traffic is expected) for specific content. In this case, it is looking
NS

for bytecode (hex representation of binary data) of “0d 0a”, the word “Authorization”,
bytecode of “3a 20” and the word “Basic”. None of the above is case sensitive. Adding
SA

further specificity, any communications with “YW5vbnltb3VzOg==” found within 32


11

byes of the previous match would be excluded. (The string starting with YW5 is base64
20

encoding for “anonymous:”. This approach identifies “basic web authentication”, one of
the most common uses for base64 and one that almost always involves usernames and
©

passwords.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 9
 
Detecting basic web authentication may be interesting but it is not always
sufficient. User credentials are not the only pieces of sensitive information that can be
encoded using base64. Consider the pharmaceutical company that deployed a complex

s.
Data Leakage Protection solution in an effort to protect their newest multi-billion dollar

t
drug. Their DLP solution is configured to watch for a specific string of characters;

igh
“super secret formula X+3(Y)/437*Q”. An insider seeking to bypass that system could

lr
simply send it out as “c3VwZXIgc2VjcmV0IGZvcm11bGEgWCszKFkpLzQzNypR”

ul
which is the base64 encoded version of that same formula. Unless the DLP solution has

f
ns
been configured to look for the base64 encoded string, it will be missed.

ai
As discussed previously, determining that a given data string is actually base64 is

et
not possible without attempting to decode it. That said, identifying strings that are

rr
consistent with base64 encoding can be done using Perl Compatible Regular Expressions.
ho
This must be done carefully as this approach is subject to significant false positive or
t
false negative results. For example, a regular expression “[0-9a-zA-Z+/=]{20,}” could
Au

be used as it looks for a string of characters that is at least 20 characters long containing
e,

letters, numbers or the special characters listed. When analyzing typical human-readable
ut

text, this approach may be reasonable as 20 character words are uncommon, however a
t
i

long URL such as http://www.something.com/something/somethingelse/somethingmore,


t
ns

would result in a positive match to the regex. Another problem with this approach is that
I

it only looks for encoded text of 20 characters or more. This would fail to detect an
NS

encoded password (for example) that is as long as 12 characters. While this approach has
a role in an overall base64 detection scheme, because of its weakness, another, more
SA

specific approach is necessary.


11

The following regular expression is more complex but does a more


20

comprehensive job of identifying base64


©

• (?:[A-Za-z0-9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-
Za-z0-9+/][AQgw]==)”

This can be more easily understood by breaking it down into its individual parts.
Basically it is looking for two groups of data as identified by the two sets of beginning
and ending parenthesis. The first group, (?:[A-Za-z0-9+/]{4}){2,}, looks for two or more

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 10
 
groups of 4 characters that match the listed letters, numbers or special characters. Note:
the “?:” is used to optimize the processing of the regex and doesn’t affect what the regex
is looking for. The second group looks for either:

s.
Two characters matching A-Z, a-z, 0-9, + or / followed by one character

t

gh
(AEIMQUYcgkosw048), followed by an equal sign.

i
lr
or

ul
One character matching A-Z, a-z, 0-9, + or / followed by an A, Q, g or W

f

ns
followed by two equal signs.

ai
The result will be at least a 12-character string meaning the source data was at

et
least 7 bytes in length. This approach results in very few false positives however does

rr
result in significant number of false negative results, or missed base64. This is because
ho
not all base64-encoded data ends with either one or two equal signs. An equal sign only
t
occurs in some implementations of base64 encoding and is used to pad the data to ensure
Au

output is in four bytes blocks. Specifically, source data that has a multiple of three bytes
e,

of data (e.g. 3, 6, 9, 12, etc.) would result in base64 encoded data with no equal signs and
ut

would be missed by this regular expression. This also assumes the specific
t
i

implementation of base64 actually uses padding. Also, there is no absolute standard for
t
ns

base64 ASCII character usage. All implementations of base64 use the characters 0 – 9, A
I

– Z and a – z but that only addresses the requirements for 62 of 64 necessary characters.
NS

Most implementations of base64 use the forward slash (/) and the plus (+) however this
creates problems in certain circumstances. For example, if base64 were to be embedded
SA

in a URL, the use of the forward slash would be interpreted as a URL divider rather than
11

part of the base64. As a result, other characters such as dash (-), underscore (_), period
20

(.), colon (:) and exclamation point (!) are used in some implementations.

The concerns related to the use of different special characters are fairly easy to
©

resolve using additional regular expressions in which other characters replace the slash
and plus. Unfortunately, the problem associated with the missing equal sign is far more
difficult. Modifying the regular expression to not require any equal signs creates a large
number of false positive results and is thus virtually useless. As a result, we are left with

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 11
 
an undesirable option: we either generate false positives or we generate false negatives.
The best approach depends on the business problem you are trying to solve.

s.
3. Understanding the Problem

t
gh
Understanding base64 and how it can be identified is interesting as an intellectual

i
lr
exercise. To be meaningful in a practical sense, it is also important to understand why

ul
base64 represents a problem. The use of base64 places businesses and other

f
organizations at risk in a variety of ways. Base64 can be use to compromise

ns
environments passively, with attackers sniffing network traffic to identify sensitive

ai
information including usernames and passwords. Base64 can be used actively to bypass

et
data leakage protection or other data-focused security controls. Based64 can even be

rr
used to directly attack many endpoints. This conbination of threats makes it both
ho
difficult to detect and significantly damaging to even well protected organizations.
t
Au

3.1. Password Disclosure


Password disclosure may be the most obvious risk associated with base64.
e,
ut

Consider the fictional pharmaceutical company discussed earlier. They require users to
t

select complex passwords of at least 14 characters in length and require that they be
i
t

changed every 30 days. Using the most sophisticated computing methods available, brute
ns
I

force cracking a 12-character password consisting of only lower case letters would take
NS

approximately 3 years (assuming the cracking environment can guess 1 billion passwords
per second. Cracking a 15-character password consisting of only lower case letters using
SA

the same computing enviornment would take over 53,000 years. (Password Recovery
Speeds, 2009)
11
20

Brute force cracking, however, isn’t always necessary. If the organization, out of
ignorance for example, uses basic web authentication, or if the user uses their corporate
©

password for a third party application that uses basic web authentication, the password
can be disclosed by sniffing traffic on a local coffee shop or fast food resturant’s wireless
network. This is because the username and password in basic web authentication are
encoded using base64, then passed to the server. There is no encryption involved.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 12
 
In addition, some “behind the scenes” applications, such as anti-virus solutions,
use base64 to encode the authentication controls between the client and the signature
update server allowing an attacker to “steal” licenses. Specifically, when testing the

s.
effectveness of the Snort rules definded throughout this document, it was discovered that

t
basic web authentication was used by a major anti-virus vendor to allow anti-virus clients

igh
to authenticate to signature update servers. The base64 used in the basic web

lr
authentication was able to be decoded revealing both the user name and password. An

ul
attacker could also use this fact to identify signature updates, conduct a man-in-the-

f
ns
middle attack and provide malware to the target masquerading as the update.

ai
3.2. Data Leakage Protection Bypass

et
Many organizations today use some type of data leakage protection or DLP

rr
solution. These come in many forms ranging from those that are specific to one protocol
ho
(e.g. email) to those that “sniff” all network traffic. In virtually all cases, these
t
technologies look for specific patterns of data such as an account number, a social
Au

security number or specific key words associated with other types of sensitive data. The
e,

use of base64 encoding can make this type of detection far more difficult.
tut

Consider the relatively simple example of a social security number or SSN. An


i
t

SSN is a 9-digit number that is often represented in the format of 123-45-6789 but can
ns

also be represented as “123456789”, “123 45 6789” or a variety of other formats.


I
NS

Detecting SSNs effectively takes a fairly complex regular expression - ^(?!000)([0-


6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$. Unfortunately, after putting
SA

all of that work into the regex, an attacker can simply encode the SSN using base64 and
wind up with MTIzLTQ1LTY3ODk=. The following table lists various SSNs encoded
11

via base64.
20

1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16  
©

M   T   l   z   L   T   Q   1   L   T   Y   3   O   D   k   =  
M   T   E   x   L   T   E   x   L   T   E   x   M   T   E   =  
M   j   I   y   L   T   I   y   L   T   I   y   M   j   I   =  
M   z   M   z   L   T   M   z   L   T   M   z   M   z   M   =  

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 13
 
As you can see, there are some commonalities that could be leveraged to create
additional regular expressions to detect base64 encoded SSNs, but the use of base64
makes detection far more difficult. In the above example, the commonalities are the

s.
result of using a specific implementation of base64 encoding and SSNs in the ###-##-

t
#### format. Using different formats, different encoding schemes or even adding some

igh
number of leading characters (e.g. spaces, periods, dashes, etc.) adds complexity, and this

lr
is only one example of the type of data a DLP solution looks for.

ul
f
If the word “Secret” is encoded using base64, the result is U2VjcmV0. Adding

ns
trailing information to the source data (“Secret 123”) results in U2VjcmV0ICAxMjM=.

ai
As you can see, the first 8 characters are the same. If you add even a single leading space

et
however, you get an encoded result, IFNlY3JldA==, that is significantly different. The

rr
same dramatic effect occurs when you make other fairly trivial changes to the source data
ho
as shown in the following table:
t
Au
Source Base64 Encoded
Secret U2VjcmV0
e,

Secret (1 leading space) IFNlY3JldA==


Secret (2 leading spaces) ICBTZWNyZXQ=
ut

Secret (3 leading spaces) ICAgU2VjcmV0


t
i

SECRET U0VDUkVU
t

SECRET UyBFIEMgUiBFIFQ=
ns
I

These dramatic variations in output make configuring a DLP system to detect


NS

specific sensitive information extremely difficult and the complexities increase as the
complexity of the sensitive data increases. While it may be possible to identify, and thus
SA

detect, the majority of possible combinations for a 6 digit word or even for something
11

with a standard format, as a social security number, it is virtually impossible to do so for


complex intellectual property or business data.
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 14
 

3.3. End User Compromise


There are numerous ways that an end user can be compromised using base64 that
primarily rely encoding to evade malware detection signatures, IDS systems and similar

s.
controls. The best examples of such an attack involve targeting an end user via their web

t
gh
browser.

i
lr
Web browsers are interesting in that they do a lot of the “thinking” for us.

ul
Originally designed to display ASCII text according to a set of rules called HyperText

f
Markup Language or HTML, the functionality of web browsers has expanded

ns
significantly. One of the functions that most web browsers will do automatically is

ai
decode encoded data. ASCII text can be encoded in hexadecimal (base16), decimal

et
(base10) and, of course, base64. This allows an attacker to embed malicious content such

rr
as JavaScript in a web site or a URL. Because the JavaScript is decoded by the browser,
ho
the actual JavaScript is not transmitted across the “wire” and thus is likely not going to be
t
detected by IDS or other controls.
Au

Consider a simple JavaScript “attack” - <SCRIPT>alert(“Pwned”);</SCRIPT>.


e,
ut

Detecting this type of script is easy using a typical IDS, however it can be encoded using
t

base64 resulting in - PFNDUklQVD5hbGVydCgiUHduZWQiKTs8L1NDUklQVD4=


i
t

making detection far more difficult. This approach can be exploited by creating a very
ns

simple web page:


I
NS

<html>
<body>
SA

<h1>Heading</h1>
11

<p>Paragraph.</p>
20

<META HTTP-EQUIV="refresh" CONTENT="0;url=data:text/html;base64,


PFNDUklQVD5hbGVydCgiUHduZWQiKTs8L1NDUklQVD4=">
©

</body>
</html>

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 15
 
A user visiting this web page would see an alert box with the word “Pwned” pop
up in their browser, but the JavaScript will have never been sent across the network,
thereby evading network based detection.

s.
This same approach can be used by pasting a link directly in a web browser’s

t
gh
URL entry field; specifically, the text “data:text/html;base64,

i
PFNDUklQVD5hbGVydCgiUHduZWQiKTs8L1NDUklQVD4=” (without the quotes)

lr
ul
will result in the JavaScript executing in a web browser as shown in the following image.

f
ns
ai
et
rr
ho
t Getting a user to click on such an unusual URL is also not particularly difficult.
The data URL scheme (as it is known) can be appended to a legitimate looking URL
Au

however there is an easier method – simply use a URL shortener such as TinyURL
e,

(http://www.tinyurl.com). Shrinking the text using TinyURL results in


ut

http://tinyurl.com/6bddyun. Given that users are familiar with compressed URLs


t
i

associated with Twitter and Facebook, it is likely that they would not give such a URL a
t
ns

second thought. While this same attack vector could be used with JavaScript directly,
I

sending JavaScript across the wire could be detected by an IDS or similar control while
NS

sending base64 would be less likely to be seen or blocked. Furthermore, using a data
URL can allow the attacker to bypass certain protective controls. Specifically, the
SA

NoScript Firefox extension (http://noscript.net) is designed to block the execution of


11

scripts, however presenting the script as a data URL bypasses this control resulting in the
execution of a script in a browser that should block that type of activity.
20

This use of base64 to evade attack is particularly concerning. Using the


©

combination of the data URL scheme, base64, JavaScript and URL shorteners, it is
trivially easy to execute arbitrary code on a victim’s computer. The code would execute
under the context of the web browser but this still provides the attacker with significant
latitude in terms of attack options including the ability to establish an outbound, SSL
encrypted communications channel. As most organizations have stateful inspection

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 16
 
firewalls, the response traffic to an established outbound session is allowed thereby
allowing the attacker to bypass many different types of perimeter controls. This type of
attack is not, in any way, sophisticated or difficult requiring only a basic understanding of

s.
JavaScript, access to a base64 encoder and access to a URL shortener.

t
gh
The attack, however, is limited by the fact that it won’t work in some web

i
browsers. Modern versions of Internet Explorer do not decode most Base64 and while

lr
ul
Google Chrome will, it will not execute the 302 redirect from TinyURL. Google Chrome

f
will decode the base64 and will execute the resulting JavaScript, thus simply hiding the

ns
data URL information behind “Click Here” or similar innocuous text would likely be

ai
successful. Many of the web browser options for the Android platform will also not

et
execute the script. As a result, while this type of attack may not work in purely

rr
Microsoft/Internet Explorer environments, it will be effective against Linux, Mac OS X,
ho
iPhones, iPads, some Android-based phones/tablets making, it an effective threat against
t
most corporate environments. In fact, according to data compiled by statcounter.com, the
Au

combination of Firefox, Chrome, Opera and Safari make up a total of 54% of the web
e,

browser usage throughout the world, making this type of attack particularly concerning.
ut

(Usage Share of Web Browsers, 2011) Furthermore, while Windows computers running
t
i

only Internet Explorer would be immune from this threat vector, Windows computers
t
ns

running Chrome, Firefox or Opera are still susceptable. As it is typically the more
I

technical employees (e.g. IT personnel) who install alternate web browsers, when this
NS

attack vector is successful, it is likely to provide more value to the attacker.


SA

 
11
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 17
 

3.4. Web Application Attacks


A variation of the browser attacks against end users involves using base64 to
attempt to bypass web application security controls such as data input validation and web

s.
application firewall technology. While many such controls are configured to detect

t
gh
obvious JavaScript as part of their cross site scripting prevention capabilities, some may

i
not detect a similar attack expressed in base64 such as <META HTTP-EQUIV="refresh"

lr
ul
CONTENT="0;url=data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3Njc

f
mlwdD4K">. This type of attack is of particular concern as the target of cross-site

ns
scripting is often not the vulnerable application but the users of that application. Thus,

ai
while an organization’s web applications may be completely secure, other applications

et
used by their users may not be resulting in the potential for compromise.

rr
3.5. Malware
hoBotnets are one of the more common forms of malware. They consist of many
t
Au
(often thousands or more) slaves or zombies that are centrally controlled by one or more
master(s). Originally, IRC was used for control as it allowed many slaves to join a
e,

specific IRC channel to receive commands. As IRC is not often used in corporate
ut

environments, it was fairly easy to simply block outboud IRC access to mitigate the
t
i
t

botnet risk. As a result, malware authors moved to HTTP for command and control.
ns

This is often done by placing HTML comments on a web page. These comments are not
I

visible when casually browsing the page but can be seen when viewing the page’s source
NS

code. The malware on the infected hosts is configured to periodically look for commands
SA

“hidden” as these HTML comments. The individual in control of the botnet simply
updates the hidden comments to send new instructions to their zombies. (Team Cymru,
11

2008)
20

If these instructions were passed “in the clear”, with no obfuscation, it would be
©

easy for IDS/IPS systems to detect them. This would increase the likelihood of detection
and make it much easier for malware analysts or incidnet responders to combat the
problem. As a result, the instructions are often encoded using base64. The zombie has a
built in base64 decoder that can be used to translate the instructions into commands that
can be understood and executed by the zombie. While base64 is not the only encoding

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 18
 
used, it is common likely because it is fairly difficult to detect using automated means
while not suffering from the processing overhead involved with true encryption.

s.
4. Base64 Auditing

t
gh
Given the risks associated with base64, having no program for detecting its use

i
lr
leaves an organization vulnerable to a variety of direct and indirect attacks. Given the

ul
complexities of detecting base64 however, such a program is an exercise in risk

f
management and compromise. Detection systems must find a balance between excessive

ns
false positives and excessive false negatives but unlike some other types of detection, the

ai
elimination of both false positives and false negatives is not possible. In fact, any base64

et
detection solution is likely to include both. The goal is to reduce them to the extent

rr
possible.
ho
t
In addition, the detection system must be tuned such that the most critical and/or
Au

accurate detection signatures “fire” first. Signatures that detect more than the presence of
base64, such as the Emerging Threats rule for detecting basic web authentication should
e,

be configured to alert first, followed by more specific base64 detection.


tut

 
i
t
ns
I
NS
SA
11
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 19
 

4.1. Compromise
As discussed previously, planning base64 detection is an exercise in compromise.
While regular expressions such as “[0-9a-zA-Z+/=_]{20,}” will detect virtually all

s.
base64 over 20 bytes in length, it will also result in significant false positives and should

t
gh
only be used in specific circumstances. More targetted regular expressions such as

i
“(?:[A-Za-z0-9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-

lr
ul
9+/][AQgw]==)” will have fewer false positive results but will miss approximately one

f
third of the base64 they see as they are looking for trailing equal signs. To address these

ns
concerns, an active program of base64 detection must be employed as follows:

ai
• Application specific base64 detection, such as the basic web

et
authentication rule, should be used whenever possible.

rr
ho • Targeted rules such as those using regular expressions that look for
trailing equal signs should be used as high-level alerts.
t
Au

• IDS operators should review alerts and add specificity to signatures as


e,

possible, thereby creating additional application sepcific base64 detection


ut

signatures.
t
i
t

• In the event that data exfiltration or targeted attacks are suspected,


ns

signatures using regular expressions that result in high false positive


I

results should be employed but should be made as specific to source and


NS

destination IP address and port, traffic direction, etc. as possible.


SA

Application specific, targeted base64 detection would include any signatures


designed to look for protocols, such as basic web authentication, that utilize base64.
11

While few of this type of signatures may be available to begin with, as base64 is detected
20

crossing the network, the circumstances involving its use can be investigated and
categorized as “known good” and “known bad”. Signatures for “known good” base64
©

usage can be created to simply allow or ignore that traffic reducing the noise generated
by the system. Similarly, specific signatures looking for unique characteristics (e.g.
source address, destination address, source port, destination port or packet payload) of

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 20
 
known bad traffic can be created. Thus, over time, the base64 detection solution will
become more accurate as it gets tuned to the specifics of its environment.

In addition to application specific rules, a base64 detection solution will need to

s.
have general rules for detecting base64 anywhere in any packet, regardless of protocol,

t
gh
such as those discussed previously in this document. Ideally, these detection signatures

i
would be geared towards reducing false positive results. During this stage of the

lr
ul
detection process, it is better to miss some base64 than to be overwhelmed with alerts.

f
The rules provided previously in this document fit this pattern. They will detect base64

ns
with either one or two trailing equal signs. This means that roughly two thirds of all

ai
base64 crossing the network will be detected. The goal at this phase is simply to broadly

et
detect the use of base64 either entering or leaving the network. Any instances of base64

rr
detected by these signatures should be investigated. The techniques for addressing
ho
known good and known bad base64 would then be used to create additional application
t
specific rules.
Au

If the use of base64 to circumvent DLP or to conduct specific attacks is detected,


e,

broad detection rules should then be implemented. These rules should leverage regular
ut

expressions such as [0-9a-zA-Z+/=_]{20,} that are highly subject to false positives but
t
i
t

that would result in few, if any, false negatives. The regular expression should be used in
ns

an IDS rule that is specific in terms of traffic direction, source address, destination
I

address, port an any other detail that can be used to reduce the volume of alerts. The goal
NS

at this phase is to catch everything related to the potential incident. The results should be
SA

investigated thoroughly and used, as appropriate, to pursue criminal, civil or


administrative action and to update the application specific signatures. The following
11

diagram provides a high level overview of this process:


20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 21
 

s.
t
igh
lr
ul
f
ns
ai
et
rr
ho
t
Au
e,
t
iut

This approach to detection is extremely active and requires knowledgeable


t
ns

responders and IDS administrators but is only appropriate for environments where some
I

reasonable level of risk related to base64 is acceptable. Using this approach an attacker
NS

who is aware of the detection methods in place could plan their “attack” such that the
input data would result in base64 without trailing equal signs. Also, when dealing with
SA

end user targeted attacks, missing one out of three base64 communications means that a
11

significant compromise could occur without detection. This “accepted risk” approach to
detecting base64 is, however, far better than simply ignoring the problem. In extremely
20

high security environments, the use of broad detection rules could be used in place of the
©

regular expressions that require the trailing equal signs. This approach would result in a
high number of false positives but would only miss base64 smaller than 20 bytes.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 22
 
Using these techniques, the detection of base64 can be customized to any
organization and can be used to detect the majority of base64 threats regardless of source,
application or protocol. Over time, these techniques can also result in a significant

s.
decrease in false positive and false negative results.

t
gh
4.2. Snort Rules

i
lr
In order to fully detect base64 using Snort, multiple rules are required, each

ul
designed for a specific purpose. A number of these rules are shown in the following

f
table:

ns
ai
False Alert
Use Rule
Description

et
Used to detect base64 as alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"ET Low false
part of basic web POLICY Outgoing Basic Auth Base64 HTTP Password positives but

rr
authentication. detected unencrypted"; flow:established,to_server; will miss all
content:"|0d 0a|Authorization|3a 20|Basic"; nocase; base64 not
ho
t content:!"YW5vbnltb3VzOg=="; within:32; classtype:policy-
violation;
associated
with basic
reference:url,doc.emergingthreats.net/bin/view/Main/2006380 web
Au

; reference:url,www.emergingthreats.net/cgi- authentication
bin/cvsweb.cgi/sigs/POLICY/POLICY_Basic_HTTP_Auth;
e,

sid:2006380; rev:10;) (Emerging Threats, 2011)


Used to detect “standard” Alert tcp $HOME_NET and -> any any (msg:”Possible Minimal false
ut

base64 as well as base64 standard base64 detected”; pcre:”/ (?:[A-Za-z0- positives but
used for privacy- 9+/]{4}){2,}(?:[A-Za-z0- will miss all
t
i

enhanced mail, MIME 9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0- regular


t

and Radix-64 encoding 9+/][AQgw]==)/”; classtype:policy-violation; sid: expressions


ns

for OpenPGP and XXXXXXXX;) without


I

requires trailing equal trailing equal


sign sign.
NS

Used to detect “standard” Alert tcp $HOME_NET and -> any any (msg:”Possible High false
base64 as well as base64 standard base64 detected”; pcre:”/ (?:[A-Za-z0- positives.
used for privacy- 9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]|[A-
SA

enhanced mail, MIME Za-z0-9+/][AQgw])/”; classtype:policy-violation; sid:


and Radix-64 encoding XXXXXXXX;)
for OpenPGP with no
11

trailing equal sign


required.
20

Used to detect a Alert tcp $HOME_NET and -> any any (msg:”Possible non- High false
modified version of standard base64 detected”; pcre:”/ (?:[A-Za-z0-9\- positives.
base64 used for URL _]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]|[A-
©

applications. Za-z0-9\-_][AQgw])/”; classtype:policy-violation; sid:


XXXXXXXX;)
Used to detect long Alert tcp $HOME_NET and -> any any (msg:”Possible Extremely
ASCII strings with standard base64 detected”; pcre:”/[0-9a-zA-Z+/=_]{20,}/”; high false
base64 compliant classtype:policy-violation; sid: XXXXXXXX;) positive
characters. results.

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 23
 

5. Conclusion
Base64 represents a very real risk to organizations that rely on computers,
networking and the Internet for a variety of reasons. Base64 is often used in place of

s.
encryption to transmit sensitive information including usernames and passwords which

t
gh
can result in unauthorized disclosure. Base64 can also be used to obfuscate attacks in an

i
attempt to bypass detection and protection technologies. Unfortunately, the detection of

lr
ul
base64 is extremely difficult as base64 is simply ASCII text that just happens to decode

f
into something else. While the detection of base64 should be part of any monitoring

ns
program, it is always going to be an act of compromise involving reducing but not

ai
eliminating false positive and false negative results. To achieve the highest overall

et
detection fidelity, organizations must implement an active program of detection that

rr
involves continual reviewing of alerts and tuning of the system. If done properly
ho
however, base64 detection can become an effective component of an overall information
t
security program.
Au

 
e,
t
iut
t
ns
I
NS
SA
11
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 24
 

6. References

2006280. (2011, May 25). Emerging Threats. Retrieved August 16, 2011, from

s.
doc.emergingthreats.net/2006380

t
gh
Coyier, C. (2010, March 25). Data URIs | CSS-Tricks. CSS-Tricks. Retrieved August

i
lr
16, 2011, from http://css-tricks.com/5970-data-uris

ul
f
Craig. (2007, August 7). Filtering base64 encoded spam | Small Dropbear. Small Drop

ns
Bear . Retrieved August 16, 2011, from http://enc.com.au/2007/08/filtering-

ai
et
base64-encoded-spam

rr
Franks. (n.d.). RFC2617 - HTTP Authentication. Internet Engineering Task Force.
ho
t
Retrieved August 16, 2011, from tools.ietf.org/html/rfc2117
Au

Freed, N. (n.d.). Multipurpose Internet Mail Extensions. Internet Engineering Task


e,

Force. Retrieved August 16, 2011, from tools.ietf.org/html/rfc2045


ut

Good, G. (n.d.). The LDAP Data Interchange Format (LDIF) - Technical Specifications.
t
i
t

Internet Engineering Task Force. Retrieved August 16, 2011, from


ns
I

www.ietf.org/rfc/rfc2849.txt
NS

Password Recovery Speeds. (2009, July 10). Lockdown.co.uk - The Home Computer
SA

Security Center . Retrieved August 16, 2011, from


11

http://www.lockdown.co.uk/?pg=combi
20

Perera, H. (2011, May 1). History of Steganography. Hareendra's Blog. Retrieved

August 16, 2011, from http://hareenlaks.blogspot.com/2011/04/history-of-


©

steganography.html

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 25
 

Prabhakar, A. (2011, January 11). the Digital me: Base 64 Encoding. the Digital me.

Retrieved August 16, 2011, from http://digitalpbk.blogspot.com/2006/12/base-

s.
64-encoding.html

t
gh
Cymru. (n.d.). A Taste of HTTP Botnets. Team Cymru. Retrieved August 16, 2011, from

i
lr
www.team-cymru.com/ReadingRoom/Whitepapers/2008/http-botnets.pdf

ul
f
Usage share of web browsers - Wikipedia, the free encyclopedia. (2011, July 26).

ns
Wikipedia, the free encyclopedia. Retrieved July 26, 2011, from

ai
et
http://en.wikipedia.org/wiki/Usage_share_of_web_browsers

rr
ho
t
Au
e,
t
iut
t
ns
I
NS
SA
11
20
©

Kevin  Fiscus,  kevinfiscus@gmail.com  


©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy