Base64 Usecases
Base64 Usecases
s.
GIAC (GCIA) Gold Certification
t
gh
Author:
Kevin
Fiscus,
kevinfiscus@gmail.com
i
Advisor:
David
Shinberg
lr
ul
f
Accepted:
April
13th
2011
ns
ai
et
Abstract
rr
Base64
is
an
encoding
scheme
originally
designed
to
allow
binary
data
to
be
represented
as
ASCII
text.
Widespread
in
its
use,
base64
seems
to
provide
a
level
of
ho
security
by
making
sensitive
information
difficult
to
decipher.
In
reality,
the
use
of
t
base64
provides
a
significant
advantage
to
attackers
while
providing
minimal
Au
benefit
to
defenders.
The
use
of
base64
can
result
in
the
disclosure
of
passwords,
bypass
of
data
leakage
protection
systems
and
can
even
be
used
to
create
a
one
e,
click,
obfuscated
and
self-‐contained
cross
site
scripting
attacks.
Because
of
these
ut
risks,
detecting
base64
usage
on
a
network
should
be
an
important
part
of
any
comprehensive
security
program.
Unfortunately,
there
is
a
problem;
base64
is
t
i
almost
impossible
to
detect
accurately
using
traditional
methods.
This
paper
t
ns
provides
an
overview
of
the
base64
problem,
and
more
importantly,
outlines
a
methodology
that
can
be
used
to
promote
base64
detection
using
the
Snort
I
©2011TheSANSI
nst
it
ute Aut
horr
etai
nsf
ull
right
s.
Base64 Can Get You Pwned 2
1. Introduction
Helix Pharmaceuticals is worried about security. In the cutthroat world of multi-
s.
billion dollar pharmaceutical companies, industrial espionage is a significant concern. In
t
gh
addition, political and social activists continually attempt to disrupt business as
i
lr
retribution for perceived injustices. As a result, Helix takes information security
ul
extremely seriously. Their security program consists of numerous protective and
f
detective controls including the use of extremely strong passwords, data leakage
ns
protection (DLP) solutions, network intrusion detection systems (NIDS), web filtering
ai
and email security solutions. The controls in place were deemed, by the Chief Security
et
Officer, to be adequate until they discovered that their strong passwords were
rr
compromised, their DLP and IDS were evaded and their web security controls were
ho
bypassed. After a thorough investigation, it was determined that one simple technology
t
was the cause of it all – base64. This story is fictional but the concepts are real and
Au
represent binary data in an ASCII text format. Like almost every aspect of computer
t
i
t
technology today, base64 if not used properly, can result is increased security risk. As
ns
mentioned in the story about Helix, attackers can also use it as a method to obfuscate
I
and/or execute their attacks, evade detection and to bypass otherwise strong controls. To
NS
mitigate the risks associated with use of base64, it is important to understand what base64
SA
is, how it is used, how it is abused and how to detect its use in modern computing
environments.
11
20
©
2. Base64 Overview
2.1. Encoding vs. Encryption
s.
When it comes to obscuring data, there are really three different approaches
t
gh
commonly discussed: steganography, encryption and encoding. Steganography, or
i
lr
“stego”, is a process by which data is hidden from observers. Herodotus documented one
ul
of the earliest examples around 440 BC. He tells the story of Histiaeus who shaved the
f
head of his most trusted slave and tattooed a message in it. Once the slave’s hair had
ns
grown back, the message was hidden. (Perera, 2011) When the messenger got to their
ai
final destination, their head would be shaved thereby disclosing the message. In today’s
et
modern age of computing, a similar effect is achieved by changing the least significant
rr
bits of each byte of an image file, for example. In pure steganography, the data is not
ho
changed in any way, but is simply hidden.
t
Au
The following two pictures look similar. The one on the left is the original. The
one on the right has had data injected into it using a program called iSteg. To the naked
e,
eye, there are few, if any, visible differences between these pictures, however if the
ut
second picture were fed into the iSteg program, the original text would be revealed.
t
i
t
s.
of the alphabet are shifted. A rotation of 3 or ROT-3 would result in two alphabets, the
t
true alphabet, in which the original message is written and the shifted alphabet. The
igh
following shows a typical ROT-3 scheme.
lr
ul
True: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Shifted: CDEFGHIJKLMNOPQRSTUVWXYZAB
f
ns
Using this ROT-3 scheme, the letter C would be used in place of A so the word
ai
CAR would be encrypted as ECT and the word HOUSE would be encrypted as
et
KQWUG. This is, of course, a very basic encryption scheme. Modern cryptographic
rr
schemes use sophisticated combinations of substitution and transposition against blocks
ho
or streams of data to come up with ciphertext that is difficult, if not impossible to convert
t
back to the original plaintext without the proper key. Encryption is an effective way to
Au
The following table shows the same text encrypted using the same encryption
ut
scheme but using different keys. Encrypting data results in a binary, rather than a text
t
i
t
file thus the binary results have been encoded using base64 to make the readable.
ns
I
Test1 5Rcw8GZ+/QM=
Hello! DES
TestTest q2a0ZkvgMeM=
SA
test uy8XtiCOto0=
As you can see, other than the trailing equal sign (a result of base64 padding),
11
different keys used to encrypt the same source text using the same algorithm result in
20
vastly different encrypted or cipher texts. Decrypting the cipher text without the key is
©
ranges from difficult to virtually impossible depending on the strength of the encryption
algorithm.
s.
process of displaying data in another format. In the world of computers, the most
t
common form of display suitable for humans to read is the American Standard Code of
igh
Information Interchange or ASCII. ASCII includes the letters and numbers we read
lr
every day plus some control characters such as backspace and tab. Thus all of the letters,
ul
spaces and punctuation written in this document so far are representations of ASCII text.
f
ns
In the world of computers however, ASCII is not the only way of encoding or
ai
representing data. In its most basic form, a single ASCII character is stored on the
et
computer as a single byte of data that can also be represented as binary, octal, decimal or
rr
hexadecimal. The following table shows the various encodings of some common ASCII
ho
characters:
t
Au
Glyph Hex Dec Oct Binary
A 0x41 65 101 100 0001
e,
• ASCII: Cat
Hexadecimal: 0x43 61 74
NS
•
• Decimal: 67 97 116
• Octal: 103 141 164
SA
All of these encodings spell Cat and as long as a recipient knows enough to
20
decode the message, they can. The fact that the message may be encoded provides no
assurance of confidentiality other than relying on the fact that any given attacker may not
©
be able to determine the method of encoding. Unfortunately, as you can see from the
example above, many types of encoding often used in the computing industry are fairly
easy to identify.
s.
fairly simple. We typically interact with numbering systems with 10 digits; 0 through 9.
t
This is a base10 system. Binary, a base2 numbering system, has 2 digits; 0 and 1.
igh
Hexadecimal is a base16 system using 0 through 9 plus a, b, c, d, e and f for its digits.
lr
Base64 typically uses 0 through 9, a through z and A through Z for the first 62 digits of
ul
the system. Different variations of base64 use different characters for the final 2 digits.
f
ns
Just as ASCII and binary can be used to represent data, so can base64. The
ai
palindrome “Was it a car or a cat I saw” would be represented as
et
“V2FzIGl0IGEgY2FyIG9yIGEgY2F0IEkgc2F3”. As you can see, the source phrase
rr
reads the same forwards as it does backwards but this is not the case in the encoded text.
ho
While this may seem “secure” the fact that you can simply paste this text into an online
t
base64 decoder and recover the original text illustrates the weaknesses of base64 as a
Au
security mechanism.
e,
the username and the password are separated by a colon, concatenated and the
results encoded using base64. (Franks, 1999)
SA
s.
technologies is obviously not recommended. The remaining use cases are but should be
t
considered suspect for a variety of reasons that will be discussed in detail throughout this
igh
document.
lr
ul
2.3. Identification and Decoding
f
The characteristics that make up a base64 encoded string are fairly simple; it will
ns
typically contain letters (A-Z and a-z), numbers (0-9) and the characters “/”, “+” and “=”
ai
where the equal sign, if found, will always be found at the end of the string. Base64
et
strings usually contain a multiple of 4 characters (e.g. 4, 8, 12, 16, etc.). In such cases,
rr
the minimum size for a base64-encoded string is 4 characters. If the source string is not
ho
long enough to generate an output of 4 characters, one or two equal signs will be added
t
for padding. This padding is found in most base64 encoded strings where the encoding
Au
does not generate a number of characters that is divisible by 4, thus you often see either
e,
one or two equal signs at the end of base64 encoded data. Based on this definition
ut
however, the words “data”, “Data” and “Database” are all potentially valid base64
t
i
t
(although they decode to random binary data) making positive validation of base64 data
ns
difficult. Making things worse, base64 does not always use the special characters / and +.
I
including the dash (-), the underscore (_), the period (.), the colon (:), and the exclamation
SA
point (!). In addition, some implementations of base64 don’t use padding. As a result,
base64 can contain any combination of letters (upper and lower case), numbers and
11
various special characters (/+-_:!) that may or may not have one or two equal signs at the
20
©
s.
in many cases, detecting base64 encoding is not really desirable as such encoding has
t
numerous legitimate uses. What we are often concerned about is the use of base64 to
igh
“secure” authentication credentials and that can be detected using, for example, Snort as
lr
seen in the following Emerging Threats rule:
ul
f
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"ET POLICY Outgoing
ns
Basic Auth Base64 HTTP Password detected unencrypted";
flow:established,to_server; content:"|0d 0a|Authorization|3a 20|Basic"; nocase;
ai
content:!"YW5vbnltb3VzOg=="; within:32; classtype:policy-violation;
et
reference:url,doc.emergingthreats.net/bin/view/Main/2006380;
reference:url,www.emergingthreats.net/cgi-
rr
bin/cvsweb.cgi/sigs/POLICY/POLICY_Basic_HTTP_Auth; sid:2006380;
horev:10;)
t
This rule is fairly straightforward, particularly when you remove the messages, ID
Au
content:!"YW5vbnltb3VzOg=="; within:32;)
i
t
ns
the ports on which web traffic is expected) for specific content. In this case, it is looking
NS
for bytecode (hex representation of binary data) of “0d 0a”, the word “Authorization”,
bytecode of “3a 20” and the word “Basic”. None of the above is case sensitive. Adding
SA
byes of the previous match would be excluded. (The string starting with YW5 is base64
20
encoding for “anonymous:”. This approach identifies “basic web authentication”, one of
the most common uses for base64 and one that almost always involves usernames and
©
passwords.
s.
Data Leakage Protection solution in an effort to protect their newest multi-billion dollar
t
drug. Their DLP solution is configured to watch for a specific string of characters;
igh
“super secret formula X+3(Y)/437*Q”. An insider seeking to bypass that system could
lr
simply send it out as “c3VwZXIgc2VjcmV0IGZvcm11bGEgWCszKFkpLzQzNypR”
ul
which is the base64 encoded version of that same formula. Unless the DLP solution has
f
ns
been configured to look for the base64 encoded string, it will be missed.
ai
As discussed previously, determining that a given data string is actually base64 is
et
not possible without attempting to decode it. That said, identifying strings that are
rr
consistent with base64 encoding can be done using Perl Compatible Regular Expressions.
ho
This must be done carefully as this approach is subject to significant false positive or
t
false negative results. For example, a regular expression “[0-9a-zA-Z+/=]{20,}” could
Au
be used as it looks for a string of characters that is at least 20 characters long containing
e,
letters, numbers or the special characters listed. When analyzing typical human-readable
ut
text, this approach may be reasonable as 20 character words are uncommon, however a
t
i
would result in a positive match to the regex. Another problem with this approach is that
I
it only looks for encoded text of 20 characters or more. This would fail to detect an
NS
encoded password (for example) that is as long as 12 characters. While this approach has
a role in an overall base64 detection scheme, because of its weakness, another, more
SA
• (?:[A-Za-z0-9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-
Za-z0-9+/][AQgw]==)”
This can be more easily understood by breaking it down into its individual parts.
Basically it is looking for two groups of data as identified by the two sets of beginning
and ending parenthesis. The first group, (?:[A-Za-z0-9+/]{4}){2,}, looks for two or more
s.
Two characters matching A-Z, a-z, 0-9, + or / followed by one character
t
•
gh
(AEIMQUYcgkosw048), followed by an equal sign.
i
lr
or
ul
One character matching A-Z, a-z, 0-9, + or / followed by an A, Q, g or W
f
•
ns
followed by two equal signs.
ai
The result will be at least a 12-character string meaning the source data was at
et
least 7 bytes in length. This approach results in very few false positives however does
rr
result in significant number of false negative results, or missed base64. This is because
ho
not all base64-encoded data ends with either one or two equal signs. An equal sign only
t
occurs in some implementations of base64 encoding and is used to pad the data to ensure
Au
output is in four bytes blocks. Specifically, source data that has a multiple of three bytes
e,
of data (e.g. 3, 6, 9, 12, etc.) would result in base64 encoded data with no equal signs and
ut
would be missed by this regular expression. This also assumes the specific
t
i
implementation of base64 actually uses padding. Also, there is no absolute standard for
t
ns
base64 ASCII character usage. All implementations of base64 use the characters 0 – 9, A
I
– Z and a – z but that only addresses the requirements for 62 of 64 necessary characters.
NS
Most implementations of base64 use the forward slash (/) and the plus (+) however this
creates problems in certain circumstances. For example, if base64 were to be embedded
SA
in a URL, the use of the forward slash would be interpreted as a URL divider rather than
11
part of the base64. As a result, other characters such as dash (-), underscore (_), period
20
(.), colon (:) and exclamation point (!) are used in some implementations.
The concerns related to the use of different special characters are fairly easy to
©
resolve using additional regular expressions in which other characters replace the slash
and plus. Unfortunately, the problem associated with the missing equal sign is far more
difficult. Modifying the regular expression to not require any equal signs creates a large
number of false positive results and is thus virtually useless. As a result, we are left with
s.
3. Understanding the Problem
t
gh
Understanding base64 and how it can be identified is interesting as an intellectual
i
lr
exercise. To be meaningful in a practical sense, it is also important to understand why
ul
base64 represents a problem. The use of base64 places businesses and other
f
organizations at risk in a variety of ways. Base64 can be use to compromise
ns
environments passively, with attackers sniffing network traffic to identify sensitive
ai
information including usernames and passwords. Base64 can be used actively to bypass
et
data leakage protection or other data-focused security controls. Based64 can even be
rr
used to directly attack many endpoints. This conbination of threats makes it both
ho
difficult to detect and significantly damaging to even well protected organizations.
t
Au
Consider the fictional pharmaceutical company discussed earlier. They require users to
t
select complex passwords of at least 14 characters in length and require that they be
i
t
changed every 30 days. Using the most sophisticated computing methods available, brute
ns
I
force cracking a 12-character password consisting of only lower case letters would take
NS
approximately 3 years (assuming the cracking environment can guess 1 billion passwords
per second. Cracking a 15-character password consisting of only lower case letters using
SA
the same computing enviornment would take over 53,000 years. (Password Recovery
Speeds, 2009)
11
20
Brute force cracking, however, isn’t always necessary. If the organization, out of
ignorance for example, uses basic web authentication, or if the user uses their corporate
©
password for a third party application that uses basic web authentication, the password
can be disclosed by sniffing traffic on a local coffee shop or fast food resturant’s wireless
network. This is because the username and password in basic web authentication are
encoded using base64, then passed to the server. There is no encryption involved.
s.
effectveness of the Snort rules definded throughout this document, it was discovered that
t
basic web authentication was used by a major anti-virus vendor to allow anti-virus clients
igh
to authenticate to signature update servers. The base64 used in the basic web
lr
authentication was able to be decoded revealing both the user name and password. An
ul
attacker could also use this fact to identify signature updates, conduct a man-in-the-
f
ns
middle attack and provide malware to the target masquerading as the update.
ai
3.2. Data Leakage Protection Bypass
et
Many organizations today use some type of data leakage protection or DLP
rr
solution. These come in many forms ranging from those that are specific to one protocol
ho
(e.g. email) to those that “sniff” all network traffic. In virtually all cases, these
t
technologies look for specific patterns of data such as an account number, a social
Au
security number or specific key words associated with other types of sensitive data. The
e,
use of base64 encoding can make this type of detection far more difficult.
tut
SSN is a 9-digit number that is often represented in the format of 123-45-6789 but can
ns
all of that work into the regex, an attacker can simply encode the SSN using base64 and
wind up with MTIzLTQ1LTY3ODk=. The following table lists various SSNs encoded
11
via base64.
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
©
M
T
l
z
L
T
Q
1
L
T
Y
3
O
D
k
=
M
T
E
x
L
T
E
x
L
T
E
x
M
T
E
=
M
j
I
y
L
T
I
y
L
T
I
y
M
j
I
=
M
z
M
z
L
T
M
z
L
T
M
z
M
z
M
=
s.
result of using a specific implementation of base64 encoding and SSNs in the ###-##-
t
#### format. Using different formats, different encoding schemes or even adding some
igh
number of leading characters (e.g. spaces, periods, dashes, etc.) adds complexity, and this
lr
is only one example of the type of data a DLP solution looks for.
ul
f
If the word “Secret” is encoded using base64, the result is U2VjcmV0. Adding
ns
trailing information to the source data (“Secret 123”) results in U2VjcmV0ICAxMjM=.
ai
As you can see, the first 8 characters are the same. If you add even a single leading space
et
however, you get an encoded result, IFNlY3JldA==, that is significantly different. The
rr
same dramatic effect occurs when you make other fairly trivial changes to the source data
ho
as shown in the following table:
t
Au
Source Base64 Encoded
Secret U2VjcmV0
e,
SECRET U0VDUkVU
t
SECRET UyBFIEMgUiBFIFQ=
ns
I
specific sensitive information extremely difficult and the complexities increase as the
complexity of the sensitive data increases. While it may be possible to identify, and thus
SA
detect, the majority of possible combinations for a 6 digit word or even for something
11
s.
controls. The best examples of such an attack involve targeting an end user via their web
t
gh
browser.
i
lr
Web browsers are interesting in that they do a lot of the “thinking” for us.
ul
Originally designed to display ASCII text according to a set of rules called HyperText
f
Markup Language or HTML, the functionality of web browsers has expanded
ns
significantly. One of the functions that most web browsers will do automatically is
ai
decode encoded data. ASCII text can be encoded in hexadecimal (base16), decimal
et
(base10) and, of course, base64. This allows an attacker to embed malicious content such
rr
as JavaScript in a web site or a URL. Because the JavaScript is decoded by the browser,
ho
the actual JavaScript is not transmitted across the “wire” and thus is likely not going to be
t
detected by IDS or other controls.
Au
Detecting this type of script is easy using a typical IDS, however it can be encoded using
t
making detection far more difficult. This approach can be exploited by creating a very
ns
<html>
<body>
SA
<h1>Heading</h1>
11
<p>Paragraph.</p>
20
</body>
</html>
s.
This same approach can be used by pasting a link directly in a web browser’s
t
gh
URL entry field; specifically, the text “data:text/html;base64,
i
PFNDUklQVD5hbGVydCgiUHduZWQiKTs8L1NDUklQVD4=” (without the quotes)
lr
ul
will result in the JavaScript executing in a web browser as shown in the following image.
f
ns
ai
et
rr
ho
t Getting a user to click on such an unusual URL is also not particularly difficult.
The data URL scheme (as it is known) can be appended to a legitimate looking URL
Au
however there is an easier method – simply use a URL shortener such as TinyURL
e,
associated with Twitter and Facebook, it is likely that they would not give such a URL a
t
ns
second thought. While this same attack vector could be used with JavaScript directly,
I
sending JavaScript across the wire could be detected by an IDS or similar control while
NS
sending base64 would be less likely to be seen or blocked. Furthermore, using a data
URL can allow the attacker to bypass certain protective controls. Specifically, the
SA
scripts, however presenting the script as a data URL bypasses this control resulting in the
execution of a script in a browser that should block that type of activity.
20
combination of the data URL scheme, base64, JavaScript and URL shorteners, it is
trivially easy to execute arbitrary code on a victim’s computer. The code would execute
under the context of the web browser but this still provides the attacker with significant
latitude in terms of attack options including the ability to establish an outbound, SSL
encrypted communications channel. As most organizations have stateful inspection
s.
JavaScript, access to a base64 encoder and access to a URL shortener.
t
gh
The attack, however, is limited by the fact that it won’t work in some web
i
browsers. Modern versions of Internet Explorer do not decode most Base64 and while
lr
ul
Google Chrome will, it will not execute the 302 redirect from TinyURL. Google Chrome
f
will decode the base64 and will execute the resulting JavaScript, thus simply hiding the
ns
data URL information behind “Click Here” or similar innocuous text would likely be
ai
successful. Many of the web browser options for the Android platform will also not
et
execute the script. As a result, while this type of attack may not work in purely
rr
Microsoft/Internet Explorer environments, it will be effective against Linux, Mac OS X,
ho
iPhones, iPads, some Android-based phones/tablets making, it an effective threat against
t
most corporate environments. In fact, according to data compiled by statcounter.com, the
Au
combination of Firefox, Chrome, Opera and Safari make up a total of 54% of the web
e,
browser usage throughout the world, making this type of attack particularly concerning.
ut
(Usage Share of Web Browsers, 2011) Furthermore, while Windows computers running
t
i
only Internet Explorer would be immune from this threat vector, Windows computers
t
ns
running Chrome, Firefox or Opera are still susceptable. As it is typically the more
I
technical employees (e.g. IT personnel) who install alternate web browsers, when this
NS
11
20
©
s.
application firewall technology. While many such controls are configured to detect
t
gh
obvious JavaScript as part of their cross site scripting prevention capabilities, some may
i
not detect a similar attack expressed in base64 such as <META HTTP-EQUIV="refresh"
lr
ul
CONTENT="0;url=data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3Njc
f
mlwdD4K">. This type of attack is of particular concern as the target of cross-site
ns
scripting is often not the vulnerable application but the users of that application. Thus,
ai
while an organization’s web applications may be completely secure, other applications
et
used by their users may not be resulting in the potential for compromise.
rr
3.5. Malware
hoBotnets are one of the more common forms of malware. They consist of many
t
Au
(often thousands or more) slaves or zombies that are centrally controlled by one or more
master(s). Originally, IRC was used for control as it allowed many slaves to join a
e,
specific IRC channel to receive commands. As IRC is not often used in corporate
ut
environments, it was fairly easy to simply block outboud IRC access to mitigate the
t
i
t
botnet risk. As a result, malware authors moved to HTTP for command and control.
ns
This is often done by placing HTML comments on a web page. These comments are not
I
visible when casually browsing the page but can be seen when viewing the page’s source
NS
code. The malware on the infected hosts is configured to periodically look for commands
SA
“hidden” as these HTML comments. The individual in control of the botnet simply
updates the hidden comments to send new instructions to their zombies. (Team Cymru,
11
2008)
20
If these instructions were passed “in the clear”, with no obfuscation, it would be
©
easy for IDS/IPS systems to detect them. This would increase the likelihood of detection
and make it much easier for malware analysts or incidnet responders to combat the
problem. As a result, the instructions are often encoded using base64. The zombie has a
built in base64 decoder that can be used to translate the instructions into commands that
can be understood and executed by the zombie. While base64 is not the only encoding
s.
4. Base64 Auditing
t
gh
Given the risks associated with base64, having no program for detecting its use
i
lr
leaves an organization vulnerable to a variety of direct and indirect attacks. Given the
ul
complexities of detecting base64 however, such a program is an exercise in risk
f
management and compromise. Detection systems must find a balance between excessive
ns
false positives and excessive false negatives but unlike some other types of detection, the
ai
elimination of both false positives and false negatives is not possible. In fact, any base64
et
detection solution is likely to include both. The goal is to reduce them to the extent
rr
possible.
ho
t
In addition, the detection system must be tuned such that the most critical and/or
Au
accurate detection signatures “fire” first. Signatures that detect more than the presence of
base64, such as the Emerging Threats rule for detecting basic web authentication should
e,
i
t
ns
I
NS
SA
11
20
©
4.1. Compromise
As discussed previously, planning base64 detection is an exercise in compromise.
While regular expressions such as “[0-9a-zA-Z+/=_]{20,}” will detect virtually all
s.
base64 over 20 bytes in length, it will also result in significant false positives and should
t
gh
only be used in specific circumstances. More targetted regular expressions such as
i
“(?:[A-Za-z0-9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-
lr
ul
9+/][AQgw]==)” will have fewer false positive results but will miss approximately one
f
third of the base64 they see as they are looking for trailing equal signs. To address these
ns
concerns, an active program of base64 detection must be employed as follows:
ai
• Application specific base64 detection, such as the basic web
et
authentication rule, should be used whenever possible.
rr
ho • Targeted rules such as those using regular expressions that look for
trailing equal signs should be used as high-level alerts.
t
Au
signatures.
t
i
t
While few of this type of signatures may be available to begin with, as base64 is detected
20
crossing the network, the circumstances involving its use can be investigated and
categorized as “known good” and “known bad”. Signatures for “known good” base64
©
usage can be created to simply allow or ignore that traffic reducing the noise generated
by the system. Similarly, specific signatures looking for unique characteristics (e.g.
source address, destination address, source port, destination port or packet payload) of
s.
have general rules for detecting base64 anywhere in any packet, regardless of protocol,
t
gh
such as those discussed previously in this document. Ideally, these detection signatures
i
would be geared towards reducing false positive results. During this stage of the
lr
ul
detection process, it is better to miss some base64 than to be overwhelmed with alerts.
f
The rules provided previously in this document fit this pattern. They will detect base64
ns
with either one or two trailing equal signs. This means that roughly two thirds of all
ai
base64 crossing the network will be detected. The goal at this phase is simply to broadly
et
detect the use of base64 either entering or leaving the network. Any instances of base64
rr
detected by these signatures should be investigated. The techniques for addressing
ho
known good and known bad base64 would then be used to create additional application
t
specific rules.
Au
broad detection rules should then be implemented. These rules should leverage regular
ut
expressions such as [0-9a-zA-Z+/=_]{20,} that are highly subject to false positives but
t
i
t
that would result in few, if any, false negatives. The regular expression should be used in
ns
an IDS rule that is specific in terms of traffic direction, source address, destination
I
address, port an any other detail that can be used to reduce the volume of alerts. The goal
NS
at this phase is to catch everything related to the potential incident. The results should be
SA
s.
t
igh
lr
ul
f
ns
ai
et
rr
ho
t
Au
e,
t
iut
responders and IDS administrators but is only appropriate for environments where some
I
reasonable level of risk related to base64 is acceptable. Using this approach an attacker
NS
who is aware of the detection methods in place could plan their “attack” such that the
input data would result in base64 without trailing equal signs. Also, when dealing with
SA
end user targeted attacks, missing one out of three base64 communications means that a
11
significant compromise could occur without detection. This “accepted risk” approach to
detecting base64 is, however, far better than simply ignoring the problem. In extremely
20
high security environments, the use of broad detection rules could be used in place of the
©
regular expressions that require the trailing equal signs. This approach would result in a
high number of false positives but would only miss base64 smaller than 20 bytes.
s.
decrease in false positive and false negative results.
t
gh
4.2. Snort Rules
i
lr
In order to fully detect base64 using Snort, multiple rules are required, each
ul
designed for a specific purpose. A number of these rules are shown in the following
f
table:
ns
ai
False Alert
Use Rule
Description
et
Used to detect base64 as alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"ET Low false
part of basic web POLICY Outgoing Basic Auth Base64 HTTP Password positives but
rr
authentication. detected unencrypted"; flow:established,to_server; will miss all
content:"|0d 0a|Authorization|3a 20|Basic"; nocase; base64 not
ho
t content:!"YW5vbnltb3VzOg=="; within:32; classtype:policy-
violation;
associated
with basic
reference:url,doc.emergingthreats.net/bin/view/Main/2006380 web
Au
; reference:url,www.emergingthreats.net/cgi- authentication
bin/cvsweb.cgi/sigs/POLICY/POLICY_Basic_HTTP_Auth;
e,
base64 as well as base64 standard base64 detected”; pcre:”/ (?:[A-Za-z0- positives but
used for privacy- 9+/]{4}){2,}(?:[A-Za-z0- will miss all
t
i
Used to detect “standard” Alert tcp $HOME_NET and -> any any (msg:”Possible High false
base64 as well as base64 standard base64 detected”; pcre:”/ (?:[A-Za-z0- positives.
used for privacy- 9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]|[A-
SA
Used to detect a Alert tcp $HOME_NET and -> any any (msg:”Possible non- High false
modified version of standard base64 detected”; pcre:”/ (?:[A-Za-z0-9\- positives.
base64 used for URL _]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]|[A-
©
5. Conclusion
Base64 represents a very real risk to organizations that rely on computers,
networking and the Internet for a variety of reasons. Base64 is often used in place of
s.
encryption to transmit sensitive information including usernames and passwords which
t
gh
can result in unauthorized disclosure. Base64 can also be used to obfuscate attacks in an
i
attempt to bypass detection and protection technologies. Unfortunately, the detection of
lr
ul
base64 is extremely difficult as base64 is simply ASCII text that just happens to decode
f
into something else. While the detection of base64 should be part of any monitoring
ns
program, it is always going to be an act of compromise involving reducing but not
ai
eliminating false positive and false negative results. To achieve the highest overall
et
detection fidelity, organizations must implement an active program of detection that
rr
involves continual reviewing of alerts and tuning of the system. If done properly
ho
however, base64 detection can become an effective component of an overall information
t
security program.
Au
e,
t
iut
t
ns
I
NS
SA
11
20
©
6. References
2006280. (2011, May 25). Emerging Threats. Retrieved August 16, 2011, from
s.
doc.emergingthreats.net/2006380
t
gh
Coyier, C. (2010, March 25). Data URIs | CSS-Tricks. CSS-Tricks. Retrieved August
i
lr
16, 2011, from http://css-tricks.com/5970-data-uris
ul
f
Craig. (2007, August 7). Filtering base64 encoded spam | Small Dropbear. Small Drop
ns
Bear . Retrieved August 16, 2011, from http://enc.com.au/2007/08/filtering-
ai
et
base64-encoded-spam
rr
Franks. (n.d.). RFC2617 - HTTP Authentication. Internet Engineering Task Force.
ho
t
Retrieved August 16, 2011, from tools.ietf.org/html/rfc2117
Au
Good, G. (n.d.). The LDAP Data Interchange Format (LDIF) - Technical Specifications.
t
i
t
www.ietf.org/rfc/rfc2849.txt
NS
Password Recovery Speeds. (2009, July 10). Lockdown.co.uk - The Home Computer
SA
http://www.lockdown.co.uk/?pg=combi
20
steganography.html
Prabhakar, A. (2011, January 11). the Digital me: Base 64 Encoding. the Digital me.
s.
64-encoding.html
t
gh
Cymru. (n.d.). A Taste of HTTP Botnets. Team Cymru. Retrieved August 16, 2011, from
i
lr
www.team-cymru.com/ReadingRoom/Whitepapers/2008/http-botnets.pdf
ul
f
Usage share of web browsers - Wikipedia, the free encyclopedia. (2011, July 26).
ns
Wikipedia, the free encyclopedia. Retrieved July 26, 2011, from
ai
et
http://en.wikipedia.org/wiki/Usage_share_of_web_browsers
rr
ho
t
Au
e,
t
iut
t
ns
I
NS
SA
11
20
©