0% found this document useful (0 votes)
12 views

ssrf-usenix-2024

This study investigates the prevalence and defenses against Server-Side Request Forgery (SSRF) vulnerabilities in PHP applications, highlighting that many developers remain unaware of these risks. An analysis of 27,078 open-source PHP projects reveals that most lack adequate defenses, with only a few implementing secure SSR features. The paper contributes by providing a comprehensive survey of SSRF attacks, existing mitigation techniques, and a new static analysis tool for identifying vulnerabilities in PHP code.

Uploaded by

Anushka Paliwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ssrf-usenix-2024

This study investigates the prevalence and defenses against Server-Side Request Forgery (SSRF) vulnerabilities in PHP applications, highlighting that many developers remain unaware of these risks. An analysis of 27,078 open-source PHP projects reveals that most lack adequate defenses, with only a few implementing secure SSR features. The paper contributes by providing a comprehensive survey of SSRF attacks, existing mitigation techniques, and a new static analysis tool for identifying vulnerabilities in PHP code.

Uploaded by

Anushka Paliwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SSRF vs.

Developers: A Study of SSRF-Defenses in PHP Applications

Malte Wessels∗† , Simon Koch∗† , Giancarlo Pellegrino‡ , Martin Johns†


† Technische Universität Braunschweig
‡ CISPA Helmholtz Center for Information Security

{malte.wessels, simon.koch, m.johns}@tu-braunschweig.de, pellegrino@cispa.de

Abstract SSRF ranks among the OWASP Top 10 [31] and CWE
Server-side requests (SSR) are a potent and important tool Top 25 [40] security risks. Recent incidents suggest that web
for modern web applications, as they enable features such applications are still exposed to SSR vulnerabilities. For exam-
as link preview and web hooks. Unfortunately, naive usage ple, an attacker recently elevated an attacker-controlled SSR
of SSR opens the underlying application up to Server-Side vulnerability affecting Microsoft Exchange servers into an
Request Forgery – an underappreciated vulnerability risk. To RCE vulnerability, bypassing authentication and firewalls, re-
shed light on this vulnerability class, we conduct an in-depth sulting in email theft [3, 17]. Most recently, PyTorch’s Torch-
analysis of known exploitation methods as well as defenses Serve suffered from an arbitrary code execution flaw caused
and mitigations across PHP. We then proceed to study the by an SSRF flaw, which allowed the attacker to load the ma-
prevalence of the vulnerability and defenses across 27,078 licious model [4]. Even applications that attempt to defend
open-source PHP applications. For this we perform an initial themselves frequently get it wrong, such as vSphere, which
data flow analysis, identifying attacker-controlled inputs into employed URL validation but suffered from an information
known SSR functions, followed up by a manual analysis of disclosure attack through SSRF [2].
our results to gain a detailed understanding of the involved Unfortunately, we know little about the prevalence of SSR
vulnerabilities and present defenses. Our results show that vulnerabilities and the effectiveness of the existing defenses,
defenses are sparse. The hypermajority of our 237 detected as the research community has given SSRs limited attention.
data flows are vulnerable. Only two analyzed applications Up to this point, prior work has been conducted only on a
implement safe SSR features. small scale and was focused on either understanding and
Since known defenses are not used and detected attacker- exploring the security risks (i.e., Pellegrino et al. [32]) or
controlled flows are almost always vulnerable, we can only novel defenses (i.e., Jabiyev et al. [21]). This paper addresses
conclude that developers are still unaware of SSR abuses and this gap by providing a large-scale analysis of SSR threats in
the need to defend against them. Consequently, SSRF is a PHP web applications.
present and underappreciated danger in modern web applica- Analyzing PHP SSRF vulnerabilities, at scale, and the code
tions. patterns associated with defenses is particularly challenging.
The lack of reliable, scalable and flexible investigation tools
hinders such analysis because it requires collecting, analyz-
1 Introduction ing, and reasoning on millions of lines of code. Tools such as
PHPJoern [9], a Code Property Graph Generator, have previ-
Server-side requests (SSRs) are a convenient service-to- ously been employed in large-scale analysis of PHP programs.
service communication pattern in which a web service sends However, our evaluation has revealed significant inaccuracies,
HTTP requests to external entities. Modern web applications such as an incorrect control flow graph and consequently in-
use SSRs to implement many user-facing functionalities, such accurate data flows edges, thereby increasing the likelihood
as URL previews [36] or integration of third-party content, of false positive results. Thus, beside our SSRF study, we also
such as external calendars [16]. While essential in practice, if present an up-to-date version of the CPG generator for PHP
not implemented correctly, SSRs can be abused, introducing web applications aiding us in our static analysis.
a wide variety of security risks, ranging from attacks such
In this paper, we present the first static security measure-
as network reconnaissance [32] to high severity ones such as
ment of SSR threats in PHP source code. Starting from a thor-
remote code execution (RCE) attacks.
ough survey of existing academic and non-academic work on
∗ Both authors contributed equally to this research. SSR vulnerabilities, attacks, and defenses, we create an up-to-
date taxonomy of SSR threats. Then, we use our taxonomy to req(URL) URL
study the prevalence of SSR vulnerabilities and defenses in
real web applications by downloading and analyzing 30.870 External
Client
popular open-source PHP web applications on GitHub. Server Server
To infer if protection mechanisms against SSR vulnerabil-
n res n res
ities are present, we study these applications with S URFER,
a flexible static analysis PHP framework tailored to sup-
Figure 1: A Server-Side Request.
port exploratory analysis of PHP vulnerabilities. S URFER
works on top of our novel PHP bytecode code property graph req(localhost)
(CPG) [47]. S URFER successfully analyzed 27,078 reposito-
ries. In this set, 15.308 applications utilize API sinks, such as Client Server SSRF
file_get_contents, that are potentially susceptible to SSR
problems. For 1.040 of these applications, S URFER was able n res
to identify at least one potentially suspicious flow into such
sinks. To further narrow down the set of vulnerability candi- Figure 2: An SSRF attack that accesses local resources.
dates, we only consider cases in which a direct data flow from
user-provided input into the sinks exists, and the adversary
controls the significant parts of the passed URL value. After the implications, including a case study and the answer to
applying these processing steps, we end up with 141 PHP our research questions (6). Finally, we provide an overview
projects that utilize server-side request sinks in a potentially of related work (7) and conclude with a summary of our key
insecure manner. takeaways (8).
To thoroughly examine the existence of potential defensive
measures and their robustness, we follow up with a manual
inspection of these applications. The results of this analysis
2 A Primer on SSRF
paint a somber picture: More than half of the identified ap-
We briefly introduced the SSR feature and its evil variation
plications use no defensive measures against SSRF and, thus,
SSRF. In this section we are providing a primer on why server-
are trivial to exploit. Only three projects leverage dedicated
side requests exist (2.1) and how they can turn into SSRF (2.2)
existing SSRF-secure request mechanism and only two of
arriving at our research questions (2.3).
the remaining PHP projects deployed robust countermeasures
against sophisticated attack techniques, such as using DNS
rebinding. Thus, our analysis demonstrates a widespread ig- 2.1 Server-Side Requests
norance by developers in respect to SSRF vulnerabilities.
In summary, this paper makes the following contributions: An application requires a server-side request as soon as it
needs information that is not stored within the application
• A survey and usage study of existing open source SSR components itself. Common examples are webhooks, pre-
abuse mitigation techniques in PHP. views of links, and interaction with external APIs like pay-
ment or authentication providers.
• A PHP Code Property Graph generator based on the The chain of events leading to a server-side request starts
modern CPG framework. with a (user) request for an application resource, depending on
• A CPG based static analysis tool chain (S URFER) to remote data. For example, if the application wants to display
identify SSRF vulnerable code. a link preview, it must perform a request and retrieve the
data required for the preview to answer the client’s request.
• A large scale study of 27,078 PHP projects for SSRF Figure 1 visualizes this event chain.
vulnerabilities and mitigation techniques. User-input-controlled SSRs are allowed for several reasons.
For instance, webhooks and link previews target user-provided
Organization of the paper We first discuss the difference URLs by design. Other SSR use cases allow the user to affect
between SSR and SSRF as well as the challenges of defending parts of the target URL, such as URL parameters.
against it, deriving our research questions (2). After having
established the required background, we detail the current 2.2 SSRF: Server-Side Requests Going Rogue
state of SSRF mitigation techniques in popular open-source
frameworks (3). We then lay the groundwork for our tooling Suppose an SSR request can be influenced by user input be-
and present S URFER, as well as our Manual Analysis to per- yond the scope envisioned by the developer. In that case,
form our large-scale static analysis study (4). The results of attackers can leverage this as an attack vector and guide the
our analysis are given next (5), followed by a discussion of request to malicious hosts and services. The developer is
responsible for ensuring that user-provided data cannot unin- known attacks and defenses according to the literature to
tentionally influence a request, e.g., by changing the target to frame our search later on.
network-internal resources. How an attacker can exploit their We start our systematization from Pellegrino et al. [32]’s
ability to influence the request depends on the deployment work on SSR attacks and amend it with results from an
context, including the network topology, the host machine’s academic literature survey. Additionally, we explored non-
configuration, and the influence’s scope. academic sources systematically by searching the web for
If a server with a vulnerable application runs localhost- ‘SSRF’ with keywords such as ‘Defense’ and ‘best practice’.
accessible services, an attacker can use the SSRF to access This resulted in the non-academic sources [8, 18, 25, 26, 30,
normally denied resources. Figure 2 visualizes such an attack. 39, 44] as well as the academic sources [21, 28, 32]. A system-
As in our introductory examples, this enables an attacker to ized overview of our results on SSR attacks and application-
conduct network reconnaissance, secret stealing, or even RCE. level defenses is given in Table 1. Table 2 augments it with
Cloud services often serve HTTP APIs on the local network, details about possible defense evasions and the respective
providing configuration and metadata [7, 12, 21]. An SSRF fixes.
vulnerability exposes this internal API to attackers.
Even if all local resources are adequately protected, an
3.1 Attacks
attacker can still abuse an SSRF vulnerability. Vulnerable
public access points can be attacked via SSR requests from Across the literature, we identified six distinct classes of at-
SSRF-vulnerable servers. As the vulnerable server performs tacks that leverage SSRF vulnerabilities to perform malicious
the request, the attacker hides their identity, impeding inves- and unintended activities:
tigations into attacks or laundering the presumed host to an
unsuspecting third party.
(A1) Recon Attack: The first class of attack tries to gain
Sometimes, vulnerabilities such as SSR occur in authen- information about a server’s network using the SSRF vul-
ticated areas of an application. They should not be ignored. nerability to reach behind the firewall and gain access to
Firstly, not all authenticated users should possess power over network internals. An attacker can identify deployed services
the whole system, e.g., regular users vs. admins. Secondly, and available machines using return values or timing-based
superuser rights inside an application, such as admin rights side channels.
in a CMS, do not equal to any rights on the host OS. This
applies especially in managed hosting environments, where
users might have superuser rights inside their application but (A2) Origin Laundering: In the second class of attack, the
not on the underlying machine. However, in both cases, a attacker uses the SSRF vulnerability to misuse the server as
malicious user or web admin could exploit an SSRF vulnera- a proxy to serve malicious data from another website. This
bility to gain access to sections of the application they are not can be used to circumvent block lists implemented by the
authenticated for or the underlying host OS. browser. An SSRF-vulnerable website can be misused as a
proxy serving otherwise blocked malicious content.

2.3 Research Questions (A3) Denial of Service: The next attack category is De-
SSRF poses a risk to anybody using SSR features, but the nial of Service (DoS) attacks, usually split into three distinct
state of SSRF is an orphaned domain in current research. To subtypes across the literature.
address this, we formulate four research questions that each Consider an SSR service using a GET parameter as input
provide a distinct insight: and reflecting the SSR’s response. An attacker can craft a URL
directing the service to a domain hosting illegal or known
RQ1: What is the current state-of-the-art of SSRF defenses? malicious content. They then provide this prepared link to a
RQ2: Are web application developers using existing SSR web scanner. When the scanner requests the prepared link,
mitigations? the SSR service requests the embedded target and mirrors
RQ3: Are web application developers using homegrown the content. The scanner then flags the SSR service due to
SSR mitigations? malicious content. Consequently, an attacker can use an SSRF
RQ4: How many web applications are prone to SSR abuses? vulnerability to put the victim server on block lists and thus
achieve a denial of service. Such a DoS is also possible on
a non-technical level. For example, if the prepared link is
3 Survey Of Attacks and Defenses reported to the authorities, the SSR provider could face legal
action.
SSRF is a multifaceted vulnerability class that allows for The remaining two kinds leverage amplification of the
varied exploitation options. The same holds for defending request. In case the SSRF does not trigger a single but multiple
against SSRF abuses. This warrants a detailed discussion of requests to a target, the vulnerable service can be abused as
Attacks D1 URL Valid. D2 DNS Valid. D3 Secure Conf. D4 Response Modification

HTTP(s) only

wrap result

C-D header
validation

rate limit
IP input

No OOP

fixed RT
Domain
Scheme

Path
Port
A1 Recon attack Port scan • ◦
Network scan △ • ◦ [21, 30, 31, 32, 49]
A2 Origin Laundering △ △ • • • [32]
A3 DoS attacks by blocklist △ • △ [30]
attack ES △ ◦ [32]
attack Ampl. △ ◦ [32]
A4 Bridging Attack • • [32, 44]
A5 Exploiting SSBs △ △ [28]
A6: Local Res. Leak △ ◦ △ [30, 39]

Table 1: Overview over SSR attacks and application-level defenses. •: Defense successfully prevents the attack. ◦: Defense
mitigates some version of the attack. △: Defense works in an allow list scenario.

Evasion Technique Evaded Def. Technique Fix


D1
URL parser
insufficient parsers
well-established,
[8, 21, 29, 31, 39, 44]
3.2 Defending Against SSRF
confusion attacks hardened parser
D2 DNS rebinding IP validation IP pinning [8, 21, 31]
D3 redirects singular checks
rechecking on each redirect
[30]
Across the literature, we identified four distinct approaches
or disabling redirects
to defend against SSRF, each with its caveats and drawbacks.
Table 2: Defense evasion and respective fixes.
(D1) URL Validation: The most immediate solution to
ensure that only intended targets are used for the SSR is vali-
an amplifier to conduct DoS attacks. The same is possible the dation based on the string on which the request is based. This
other way around if a targeted service serves large or multiple validation can cover parts or the complete set encompassing a
responses, leading to a DoS of the vulnerable server. URL – scheme, domain, port, path, and query – using either
an allowlisting or a denylisting approach.
(A4) Bridging Attack: The fourth kind of attack is bridg- While this is the most apparent defense, it has severe
ing attacks. Bridging attacks can happen when an attacker limitations. The correctness of the allow/deny procedure is
can control the protocol of a request, allowing the attacker paramount, and past events have shown that validating URLs
to bridge between different protocols. A notorious example is not trivial [1]. This results in Parser Confusion attacks,
is the Gopher protocol. Since Gopher is simplistic, HTTP a well-documented attack on insufficient parsers. These at-
requests can be interpreted as Gopher requests. If the SSR tacks bypass an implemented mitigation via parser bugs or
client supports Gopher, an SSR can quickly turn into Remote unexpected encodings. Therefore, to parse user input, well-
Code Execution via bridging attacks, as shown by Gupta [18]. established correct parsers should be used [21, 31]. Conse-
quently, the whole URL has to be validated, as only prefixing
protocol and domain in front of the user input might not be
(A5) Exploiting SSB: The fifth class of attack exploits the sufficient, as the user input can contain characters that confuse
client used to perform SSR. Musch et al. [28] established the the parser [44].
risk of using a full browser as a client for conducting SSRs. A robust and complete URL allow-listing can defend
Given the constant struggle of browser vendors to keep up against SSR abuses, as it limits the requests to well-known
with the most recent exploits, an outdated browser executing benign targets [30]. Denylisting is insufficient since attackers
arbitrary requests can easily become an attacker’s gateway to can always register or overtake new domains.
the server.
(D2) DNS Validation: Using DNS validation expands on
(A6) Local Resource Leak: The sixth and final class of checking the destination URL by validating the actual tar-
SSRF covers exploitation accessing otherwise non-reachable get, as it ensures that only a predetermined set of IPs can be
local resources via a request to an internal IP or localhost. requested. As the URL is resolved, the corresponding IP is
This is the most commonly known variation of SSRF and was validated against an allow or deny list, ensuring only intended
featured in Figure 2. targets. However, this defense has its challenges, as a cunning
attacker could change the targeted IP between the check and pitched. On a network level, the SSR can be routed through
the actual request time. a proxy [21, 39], ensuring requests can neither access local
A proper implementation has to resolve the domain only resources nor devices on the internal network. Furthermore,
once and then keep using the resolved and validated IP. This a robust network segmentation can ensure that no local re-
advanced defense is called IP pinning and provides the only sources can be accessed. Authentication should be enabled
reliable defense against attacks targeting local resources with- for all services [39]. This approach only affects A6 and po-
out unduly restricting the versatility of the SSR feature. IP tentially impacts A1, making all other attacks feasible.
Pinning protects against the DNS Rebinding validation bypass Tennant [39] proposed the concept of a SSRF Jail, where
attack. DNS and networking calls are hooked on the OS level. How-
In a variant of the DNS validation approach, the target is ever, they note that it is not suitable as a practical solution
validated against a denylist of unwanted targets, such as lo- since applications have to request internal, i.e., deny listed,
calhost. This can be used as a defense for applications that resources for valid reasons, which a naive hooking solution
take arbitrary user input as a target by design, e.g., URL pre- would prevent.
views. However, this technique is limited by the ability of the
developer to ascertain the deployment context. If they miss Threat Model In this work, we study the SSRF mitiga-
targets in complex deployment scenarios, this technique is tions that developers deploy. Therefore, we assume a bug and
insufficient. For example, suppose only typical local targets vulnerability-free PHP standard library.
are blocked, but the application is deployed in a cloud envi- If we mention allow-listing approaches, we assume devel-
ronment. In that case, attackers can still access cloud-internal opers have a complete, up-to-date list of IPs or domains they
IPs hosting meta-data and configurations. control and trust. To summarize our exploration of the SSRF
attacks and defenses and the resulting thread model: We as-
(D3) Secure Configurations: Shifting the focus from ac- sume an attacker who can access a service that affords the
tual application code to feature flags and configurations capability to trigger and influence the target of a request.
of application components, the third type of defense cov-
ers secure configurations. To avoid bypasses resulting from
HTTP redirects, the used HTTP client should either reject
3.4 Existing SSRF Defense Implementations
redirects in general or apply the target validation after each We now understand the attacks an SSR feature must be de-
redirect [39]. Additional important security settings are vis- fended against and a set of working defenses and mitigations.
iting only HTTP(s) URLs, mitigating A4, and not expos- The remaining question is whether libraries used to make
ing an Open Origin Policy (OOP). I.e., they shouldn’t set SSRs provide developers with the means to do so securely.
the Access-Control-Allow-Origin header to *; otherwise, To answer this question, we survey existing PHP frameworks,
they are susceptible to being used for Origin Laundering. libraries, and HTTP clients with SSR capabilities.
Because XML External Entities can fetch external re- Table 3 lists all PHP Frameworks included in our survey
sources, previous work included disabling XML External and the standalone PHP HTTP clients. We compiled this list
Entities as an SSRF mitigation. Still, since this is a) a general by searching the web for the most used and popular PHP
security issue and b) off by default since PHP 8 [41], we will frameworks and clients. Additionally, we included all clients
not discuss it further in this work. listed by the HTTPlug project [43]. The first column indicates
if a framework offers an HTTP client and, therefore, SSR
(D4) Response Modification: The fourth and final ap- capabilities.
proach to mitigate the effect that turns an SSR feature into To evaluate SSR capabilities as well as SSRF defenses,
SSRF can be achieved by modifying the response an SSR pro- we manually checked the documentations for mentions of
vides. The response can be wrapped to not directly reflect the Server-Side Request Forgery. We found that only five HTTP
accessed content or use the Content-Disposition header clients offer any defense:
to prevent a browser from directly rendering the response [27]. Symfony Since Version 5.1. Symfony provides the
Either approach reduces the ability to abuse the SSR feature NoPrivateNetworkHttpClient decorator, which ships mit-
as a proxy (A2). Finally, imposing a rate limit and fixing the igation against SSRF, which blocks requests to internal net-
response time would severely hamper an attack’s ability to works (D2) [37, 38]. Technically, the decorator hooks into
conduct a recon attack (A1) or a denial-of-service attack (A3). the request process and is called after the DNS resolution to
However, no response modification prevents A4 to A6. check if the resolved IP address is on a predefined denylist of
internal IP addresses.
SafeCurl SafeCurl [14] blocks requests to internal IPs (D2)
3.3 Further Defenses and Threat Model
and pins the protocol to HTTP(S) (D3). Optionally, it provides
Multiple mitigation-level approaches that expand past the ap- DNS Rebinding protection by implementing a check of the
plication level and touch the SSR’s configuration have been resolved IP vs. a denylist (D2).
Framework SSR capability Defense the URL with respect to internal IPs but does not pin the
WordPress ✓ † resolved IP address. It is, therefore, vulnerable to DNS
Laravel ✓via Guzzle ✗ Rebinding attacks. Since we could bypass both promised
Symfony ✓ D2 safety features, we don’t consider it a safe client for this work
Yii ✗ ✗
and included it as a regular sink for our later study.
CakePHP ✓ ✗
FuelPHP ✓ ✗ Disclosure We constructed a Proof-of-Concept and dis-
Windwalker ✓ ✗ closed the DNS Rebinding issue to WordPress. However, the
Zend ✓ ✗ issue was closed as a duplicate of a report from 2017. We also
Laminas ✓ ✗ reported the issue of unexpected redirection to WordPress.
CodeIgnite ✓via Curl ✗ Safe to use Sanitizer? Additionally, WordPress provides
PHPixie ✓ ✗ the function esc_url_raw. The documentation states, ‘The
HTTP Client resulting URL is safe to use in [. . . ] HTTP requests.’ [45].
Guzzle ✓ ✗ The last statement is wrong and dangerously misleading since
SafeCurl ✓ D2, D3 the function does not defend against SSRF and SSR abuses.
SafeURL ✓ D2, D3 In our later studies, we found data flows solely relying on
HTTPlug ✓ D2, D3 (via plugin)
this function to sanitize the input of an SSR sink; refer to
PECL HTTP ✓ ✗
Section 5.3. Disclosure: We reported the issue to WordPress.
ReactPHP Sockets ✓ ✗
WordPress Requests ✓ ✗ Overall, only one framework and three clients support pro-
Buzz ✓ ✗ tection against SSRF, while several others support SSR capa-
Httpful ✓ ✗ bilities without warning users about their risks or providing
PHP stdlib ✓ ✗ SSR defenses. Note that since most HTTP clients are sup-
PHP curl ext. ✓ ✗ ported by the HTTPlug system, they could be combined with
the plugin to make them safe.
Table 3: PHP frameworks and HTTP clients included in our
survey. The two columns indicate if they offer SSR capabil-
ities and implement SSRF mitigations. ✓= Exists, ✗= Non- 3.5 Usage Study
existent, † = exists, but is vulnerable.
To evaluate if applications in our data set use the HTTP clients
with some form of offered protection, we used ripgrep [15]
SafeURL IncludeSec team [20] presented a set of safer to conduct a string-based search for usage of the correspond-
HTTP clients in 2016, including SafeURL for PHP. It is a fork ing HTTP clients. We constructed search expressions from
of SafeCurl. usage examples, i.e., we searched for use statements, new
HTTPlug plugin The HTTP client abstraction HTTPlug statements, and fully qualified function names.
can be used with plugins to extend its functionality. Benoist We found four repositories using safer HTTP clients in our
[10] published a plugin that introduces SSRF mitigation ca- dataset of 30.870 applications. One of them is the composer
pabilities for HTTPlug. It is ‘inspired by SafeCurl’ and has backend packagist. It is using a safe SSR request in its
the same capabilities as SafeCurl. GitHub migration code. Another application, with safe HTTP
Guzzle Guzzle is a popular third-party HTTP library. It client usage, is authored by one of the maintainers of one
does not provide an SSR defense, however the issue was of the HTTP clients. This means that effectively, only three
already raised in the issue tracker [34] and a domain allow list third-party open-source applications are using an HTTP client
was proposed. However, the issue was closed due to inactivity. with a defense available.
Only a fraction of the frameworks that provide SSR func-
tionality discuss SSRF in their documentation, and only Sym-
WordPress
fony provides an actual SSRF defense. Only the SafeCurl
We will now present our findings on WordPress. family provides an SSRF defense out of all the pure HTTP
wp_http_validate_url vulnerabilities Word- clients. If used through the HTTPlug framework, most clients
Press provides functions such as wp_safe_remote_get could be secured using a plugin inspired by SafeCurl. But
that promise safe requests via the wp_http_validate_url nobody is actually using these defenses.
function: ‘The URL is validated to avoid redirection and This study shows that developers do not use existing coun-
request forgery attacks.’ [46]. We tested the first part of this termeasures against SSRF. This raises the question of whether
statement with a status 307 redirection, but all functions they implement SSR defenses themselves. Or are they not
of the function family followed the redirection except the using any defense at all? We need to inspect the source code
_head variation. We checked the second part of the statement of the applications for homegrown SSRF mitigations to solve
(request forgery protection): WordPress attempts to sanitize this question.
a := 1 a c a c
c := 2
a 1 c 2 < echo echo c 1 a
if a < b
echo a
:= := if := echo
else
echo c
BLOCK

(a) Computation of a max (b) The CPG representation of Figure 3a with the nodes (c) The extracted program slice for echo a
value. Highlighted state- that might influence the slicing node echo max. The of Figure 3b. The green lines represent data
ments might influence the black edges represent the AST, the blue edges the CFG, dependency and the blue line represents
slicing criterion echo max. and the green edges the DDG. control dependency.

Figure 3: A sequence of visualizations showing the transformation of a computer program (3a) into a Code Property Graph (3b)
and finally an extracted program slice (3c).

4 Identifying SSRF Vulnerable Code We present an up-to-date PHP CPG generator that is not
based on the source code but utilizes the PHP interpreter’s in-
Given the complexity of SSRF, we want to study its preva- ternal bytecode representation. The PHP interpreter provides a
lence and its expression in common software. For this, we debug function that dumps the bytecode representation of syn-
develop a static analysis methodology that leverages a Code tactically valid PHP source code. Our CPG generator parses
Property Graph (CPG) code representation [47] and interpro- this bytecode dump into an AST structure. The main advan-
cedural data flow analysis to identify interfunctional data flow tage of using the bytecode representation is that the dump
from user-controlled sources into a known SSRF sink. Subse- comes with a CFG, which we use to add our CFG edges. This
quently, we manually analyze identified data flows to gain an leads to a high degree of CFG correctness as the CFG is taken
in-depth understanding of vulnerable flows and any present directly from PHP. Based on the CFG, we generate the data
attempts to defend against SSRF exploitation. dependency graph using a standard algorithm [5]. Another
We first lay the theoretical groundwork establishing our advantage of using the bytecode representation provided by
static analysis methodology (4.1), which is followed by de- the interpreter is that each function and method reference
scribing our subsequent manual analysis of any identified data is represented with a fully qualified name, including possi-
flow (4.2). ble namespaces or class names in the case of static methods.
Therefore, we create the call graph by matching the qualified
names of functions and methods with their definitions if they
4.1 Automatic Static Analysis are unique. In cases where the names are not unique, e.g.,
method names shared across class definitions, we do not cre-
To present our static analysis methodology, we start by giving
ate a call edge. Our tool is implemented against the publicly
a brief introduction to CPGs (4.1.1) and Data Flow Analysis
available specifications and framework for the Code Property
(4.1.2), followed by an explanation of how we leverage those
Graph [48] and is publicly available1 .
techniques to detect SSRF vulnerabilities (4.1.3).

PHPJoern We decided against using the existing PHPJo-


4.1.1 PHP Code Property Graphs ern ([9]) due to shortcomings we encountered. Besides being
unmaintained and deprecated by its authors, we will discuss
A CPG is a combination of multiple different graph repre-
the two major shortcomings:
sentations of program source code [47] – most prominently
PHP Compatibility: PHPJoern can only generate CPGs
the Abstract Syntax Tree (AST), Control Flow Graph (CFG),
for code compatible with PHP 7.1. PHP 7.1 has been dep-
Data Dependency Graph (DDG), and Call Graph (CG). The
recated and superseded by several minor and major releases
ASTs of each method of the analyzed program are the forest
introducing new (syntax) incompatible features. As a result,
that forms the basis of the CPG. Each remaining graph (CFG,
PHPJoern is unable to analyze modern PHP code and skips
DDG, CG) is layered into the ASTs by reusing the existing
corresponding files that can thus not be analyzed.
AST nodes and adding corresponding edges. For example, by
CFG: PHPJoern generates the control flow graph itself,
adding CFG edges between subsequently executed statements.
which leads to inconsistencies. exit() and die(), commonly
Figure 3b provides a visualization of a CPG representing the
code in Figure 3a. 1 https://github.com/SSRF-vs-Developers
used by PHP developers to stop a script prematurely in if e.g., constants and variables. Nodes represent the transforma-
a check fails, are the culprits for the inconsistent behavior. tions, and the root is the sink. We traverse the tree, starting at
Implemented checks are commonly access controls, exit on the leaves, and apply the transformations we encounter. This
failures, or user input validations. Figure 4 provides a minimal procedure depends on the semantics of each transformation
code example. If a condition (line 3) is met, the process ends and leverages the individual bytecode instructions stored in
with an exit call (line 4). Otherwise, line 6 is executed. The the nodes. Thus, if multiple leaves lead to a concatenation,
echo is unreachable if the condition is satisfied due to the we combine them. If they lead to an assignment, we create
exit call. PHPJoern generates a control flow edge between the a list of multiple outcomes, as multiple incoming dataflows
exit()-node (4) and the echo (6). This is wrong. indicate that multiple values are possible.
Our bytecode CPG does not suffer from this issue because As this procedure is only used as a filter for our later manual
we get the CFG directly from the PHP interpreter. Therefore, analysis, we are liberal in the transformations we apply, and
we eliminate the potential for derivations between the actual if we do not know the semantics of an instruction, e.g., an
PHP control flow as interpreted by the PHP engine and our unknown function call, we pass the arguments through so as
CFG. not to miss a possible vulnerability. Global variables, e.g. the
super globals $_REQUEST,$_POST, and $_GET, are marked
1 <?php in the output. This allows us to recognize user-controlled
2 echo "init";
3 if($condition) { sections of the input values passed into the sink and reason
4 exit(); about a possible SSRF vulnerability due to user-controlled
5 } input.
6 echo "code";
As discussed in 2.2, we assume that existing sources and
sinks are reachable, e.g., if they are behind some access con-
Figure 4: PHP code that triggers the CFG bug in PHPJoern. trol system, we assume that an attacker has access to them.
This attack scenario is interesting in more complex deploy-
ment scenarios such as managed hosted services where regular
users might have admin rights in the web app but not on the
4.1.2 Data Flow Analysis barebone machine.
To perform the data flow analysis, we extract the subgraph
that forms the input value to a given sink to perform the data 4.2 Manual Investigation of Candidates
flow analysis. Figure 3c provides a visualization of a data
flow extracted from the CPG in Figure 3b. Applying S URFER reduces the initial set of projects of web
Our data flow implementation starts by following back the applications to those also containing a data flow into a sink
data dependency edges leading into the sink call and collect- with an abstract representation of the value passed into the
ing the involved nodes and edges recursively until all subse- sink. However, a simple data flow analysis is not sufficient
quent data dependency edges are consumed. After finishing to achieve our goal of identifying the prevalence of SSRF
the intrafunctional analysis, we identify each function call and accompanying insufficient defenses. A defense does not
and function parameter usage and follow the call edges to necessarily reside on the data flow but can be contained in
the target and source nodes. The algorithm is then restarted the direct (conditionals) or indirect (assertions) control flow
from those nodes again. Our procedure stops as soon as there associated with it. Consequently, we performed a manual
are no further nodes and edges to be added. The result is a in-depth analysis of each identified flow.
subgraph containing all nodes involved in forming the input We take the candidates found by the previous static analysis
value for a given start call, i.e., a program slice. and analyze them manually for possible defenses against SSR
attacks and to determine if they are vulnerable. We start by
filtering the apps into three distinct categories to remove any
4.1.3 S URFER – Detecting SSR(F) with Program Slicing applications that are vulnerable by design or cannot be con-
To identify potential SSRF vulnerabilities, we start out by sidered a proper web application, to begin with (Section 4.2).
searching for common SSRF sinks (Appendix 8.1) within Next, we analyze whether a detected data flow is trivially
a given CPG. Each such statement then serves as a starting exploitable (e.g., direct application of a sink on user inputs) or
point for our program slicing. if a data flow is a false positive. Finally, we perform a manual
We reconstruct the possible input strings to the detected in-depth exploitation and defense analysis to establish how a
sinks based on the extracted program slice by traversing the detected data flow can be exploited and whether any form of
slice across its data dependency edges. Starting at the slice defense is present (Section 4.3).
leaves, i.e., the initial input values, we collect the applied trans-
formations for each step until we reach the sink. This process Filtering for Apps Our overarching goal is to answer RQ4
results in a tree structure. Leaves represent the input values, and RQ3, i.e., are developers of applications using home-
grown defenses, and how many applications are vulnerable to – S URFER – to detect SSRF vulnerable code and its possi-
SSR abuses? This entails filtering any flow that is either irrel- ble protections. In this section, we describe the data set we
evant to our attacks or not representative of real applications. searched with S URFER (5.1). We then discuss the parame-
Based on our reconstructed inputs, we filter out any flow ter as well as the metadata surrounding our search (5.2) and
for which an attack does not have control over the domain. present the raw results of S URFER, as well as the result from
We approximate this by only taking these candidates into our manual analysis (5.3).
consideration where attacker-controlled input is at the start
of the reversed string. All of the discussed attacks require 5.1 Data Set
the attacker to control the domain, consequently, any flow for
which this is not the case is not interesting for our overarching
research questions.
Next, we survey all remaining flows and their correspond- 1600
Repo With Sink

ing applications to determine the type of project they belong 1400


CPG With Sink
Surfer Success
to. We discard flows sourced from examples or test code. But Flow Detected
1200
most importantly, we remove any flow that are detected in
‘hacking tools’. Hacking tools are projects that do not contain 1000

a real application but are used for red and blue teaming, such
as Capture-the-Flag tasks and solutions, applications that are Count 800

600
vulnerable by design for educational purposes, and malicious
applications such as web shells. These are not representative 400

of the regular developer and application and are thus out of 200

the scope of our research.


103 104
Stars
4.3 Detailed Analysis
The remaining flows are inspected in-depth in a manual anal- Figure 5: Stars of the repositories available, converted to a
ysis for all known defense techniques detailed in Section 3.2 CPG, successfully analyzed, and with a flow detected. Only
by manually retracing the data and control flow. repositories that contain a sink are included.
D1 URL Validation We analyze if any validation of the
input URL takes place and distinguish between allow-list and
deny-list approaches. Additionally, we categorize its imple-
mentation (e.g., via a regular expression) and which parts of 1000 Repo With Sinks

the URL are checked, e.g., if only non-HTTPS requests are CPG With Sink
Surfer Success
prevented. Additionally, we note any usage of a proper URL 800
Flow Detected

parser or homegrown solution. Finally, we assess whether


breaking any present URL validation is possible. 600

D2 DNS We check for the presence of DNS validation and


distinguish between DNS allow and denylisting, as well as if Count
400
IP pinning is used.
D3 Configuration If we detect any configuration of the
200
used sink we analyze how it affects redirection (i.e., enabled
or disabled), if only http(s) requests are possible, and if there
is a Open-Origin-Policy configured.
D4 Response Modification Finally, we analyze how the Lmes of Code

response of the SSR is used and, more significantly, if it is


returned. If it is returned, we check if the result is wrapped Figure 6: Lines of Code of the repositories available, con-
or reflected as-is and if a Content-Disposition header is set. verted to a CPG, successfully analyzed, and with a flow de-
Additionally, we check for rate-limiting routines and fixed tected. Only repositories that contain a sink are included.
response time mechanisms.
Our collection of open-source PHP applications was re-
trieved from GitHub – the largest code hosting platform. We
5 Results used the public API to acquire all PHP applications with 26 or
more stars2 and started the download on July 31th, 2023. The
We have established SSRF as an intricate and relevant security
issue and proposed a methodology and an implementation 2 this occurred organically through the limitations of the GitHub API
process took 6 hours and 40 minutes and occupied 411 GB
of RAID SSD storage. Our final data set consists of 30.870 domain attacker-controlled (65)

repositories. manual analysis (101)

We applied our CPG generator to each project and success-


app (158)
fully created 28.325 CPGs. The CPG generation was run on domain not attacker-controlled (36)

a AMD EPYC 7702P with 504 GB RAM with 10 parallel false positive (18)
flows (237)
processes. Each generation was allocated 20 GB of RAM
and forcefully terminated after 10 minutes. To determine the trivially exploitable (39)
coverage of relevant repositories in our dataset, we conducted
another ripgrep-based search for calls of our SSR sinks. This
is a rough over-approximation. 15.308 of our 30.870 repos- hackingtool (71)

itories had at least one match. The distribution of stars and


lines of code of the analyzed projects is displayed in Figure 5 test (7)
and Figure 6, respectively. We have only included those that example (1)

contain some SSR sink.


Figure 7: Sankey plot of analyzed flows.

5.2 Applying S URFER


Table 4 provides an overview of our findings regarding de-
We compiled PHP functions that can trigger network requests
fense techniques 3.2 and 3.2. Potentially vulnerable flows fall
in default configurations and included the popular ‘curl’ ex-
into one of three categories: those without URL validation,
tension [42]. Additionally, we amended the list with functions
allow-list style validation, or deny-list style validation. The
from WordPress, as it is the most-used PHP framework. Sinks
most used implementation type was regular expressions. Ad-
requiring a specific configuration and protocol were excluded,
ditionally, five flows were found that use a good URL parser,
e.g., if allow_url_include is set to true, the include and
and 14 validations were identified as broken. 216 flows val-
require functions of PHP can trigger network requests. Our
idated no special part of the URL, while 10 did validate the
final list includes 15 function calls we use as sinks. The calls
scheme.
are listed in Appendix 8.1.
(D2) DNS Defenses No DNS resolution or pinning was
We ran S URFER with a timeout of 40 minutes and 20 GB
used.
of JVM heap space per analysis on a AMD EPYC 7702P
(D3) Configuration Four candidates were found that
machine. S URFER was done after 11 hours. S URFER success-
changed the redirection behavior. They all disabled it. No
fully analyzed repositories that sum up to 107.4 mil. lines of
candidate set an Open Origin Policy. Table 5 provides an
PHP source code in 1.4 mil. files3 . By average, each repository
overview of the configuration defaults of the sinks. Table 6
had 17.8k lines of code and 50.5 files.
provides an overview of the frequency of sinks. The core PHP
provided sinks, i.e., file_get_contents, getimagesize,
5.3 Manual Analysis and get_headers accept non-HTTP(S) URLs by default. As
In this section, we will discuss the results of our manual discussed in Section 3.4, the WordPress functions do not
analysis. A visualization of the first few steps can be found in disable redirects, except for the head function. Unsurpris-
Figure 7 as a Sankey plot. ingly, the WordPress HEAD functions and the PHP built-in
After the first pass of our manual analysis, we categorized get_headers function do not follow redirections. As their
our dataset. The dataset consists of 1.040 apps with some data primary purpose is to return headers, redirection headers
flow. After filtering for dataflows with user input at the start could never be polled by these functions if they were exe-
of the string, we obtained a dataset of 237 flows. It consists cuted instead of returned.
of 158 flows we categorized as an app, 71 as a hacking tool, 7 (D4) Response Modification 15 flows returned the SSR
as test code, and 1 as an example. result, and 13 did so in a wrapped manner, i.e., they wrapped
Of all the flows labeled as belonging to an application, 39 the result in a JSON object and returned that or used some
are trivially exploitable, 18 were false-positives, and 101 re- other processing step. No flow was found using a Content-
quired further analysis. The false positives were mostly due to Disposition header, rate-limiting, or fixing the response time
over-approximations in our static analysis for unknown func- to a constant value.
tion calls and objects. In 65 flows, the host part of the URL
was attacker-controllable; in 36, it was not. A full overview
of the first steps of our manual analysis pipeline can be found
in the Sankey diagram in Figure 7.
3 all lines of code and num. of files are measured via cloc [11]
# Sink #
no validation 38 file_get_contents 85
category allowlisting 26 curl_init 24
denylisting 10 getimagesize 24
Regex 12 get_headers 11
is_file 10 wp_remote_get 10
strpos 6 wp_remote_post 2
substr 5 requests::get 1
implem. WP’s esc_url_raw 4 wp_remote_head 1
type opendir 4
in_array 2 Table 6: Frequency of sinks in candidates classified as ‘app.’
WP’s clean_url 1
is_readable 1
wp_http_validate_url 1 cussed in the literature and community. SSRF can be used
broken validation 14 as a stepping stone to deliver payloads for other vulnerabil-
Other ities, circumvent firewalls, etc. The range of SSR abuses is
Secure, well-established URL parser 5
exorbitant.
none 216
only scheme 10
only host 2 Survey Result II: SSRF Defenses Ready-to-be-used de-
validated only path 3 fenses against SSRF exist in Symfonfy’s HTTPClient and
URL parts only query 1 the SafeCurl family of tools. WordPress tries to implement
scheme and host 4 some defense but fails to do so. Since Symfony is a major
scheme and query 1 framework and SafeCurl is effectively a drop-in replacement,
. . . all other combinations 0 we expected to find some usage. Surprisingly, they are used
barely at all.
Table 4: Analysis results for URL validation (D1). We developed a novel PHP CPG and S URFER to answer
whether developers use home-grown defenses. There are two
Default Configuration ways of defending against such attacks: Deny and allow list-
Sink ing.
redir. disabled HTTP(s) only
file_get_contents ✗ ✗
get_headers ✓ ✗ Manual Analysis: Deny Listing To implement a proper
getimagesize ✗ ✗ deny list-style approach, DNS resolution and DNS pinning
curl (init and setopt) ✓ ✓ are required. Otherwise, an attacker could either register a
wp_* exc. head ✗ ✓ new domain not on the denylist to attack a target with an
wp_(safe_)remote_head ✓ ✓ SSR or provide a short-lived DNS entry to perform a DNS
rebinding attack.
Table 5: Default configuration behavior of the SSR sinks. We did not encounter any DNS-based defenses. Since prop-
erly defending against SSRF attacks in a denylist scenario
requires DNS rebinding, we were surprised by the absence of
Proper Defenses We found two apps (three flows) that
DNS requests. It is not entirely unexpected that DNS pinning
implement a proper defense. Both use an allow-listing ap-
routines are missing since it is a complex technique, but we
proach. One app uses parse_url to extract and compare the
find it noteworthy that no single DNS request call was found.
host against an allow-list. The other app defines a regular
expression that only allows user input in parts of the query,
like this: https://example.com/path/.*/foo.txt. Manual Analysis: Allow Listing SSR abuses can also be
prevented using proper URL parsing and a complete allow list
(compare our threat model 3.3). Most applications (38) did
6 Discussion not validate the URL in any way, but 26 did so in an allow list
manner. However, since rarely any flow uses a proper URL
This section will discuss our results, including valuable in- parser, and most flows don’t validate the arguably most impor-
sights for other work. We will present two case studies, discuss tant part of the URL, the domain, allow-list-based defenses
our limitations, and finally answer our Research Questions. do not seem to reflect the status quo as well. We could only
find two apps that do properly allow list-based defending.
Survey Result I: Community Knowledge We have es- Technically, prefixing user input with a qualified URL can
tablished that the techniques to exploit SSRs are widely dis- be interpreted as a type of allow listing. However, it cannot be
considered a deliberate countermeasure. Instead, it is a funda- list before passing it to the custom sink-wrapper request()
mental design choice to realize a specific functionality of the in line 10. To do so, it first calls the userland function
application. Therefore, we argue that this is not a sufficient get_root_domain. Thus, it is implementing D2: Domain
indicator of SSRF awareness. URL Validation.
However, the implementation of get_root_domain is de-
Developers We have established that there is prior work fective. The developers utilize a home-grown solution in-
available on the dire consequences of SSR abuses. But nobody stead of using a well-established URL parser, such as PHP’s
is using existing defense solutions nor properly deny list- built-in parse_url(). It splits the provided URL using the
based defenses. We could only find two apps using proper forward slash as the delimiter. The third element is reversed
allow-list-based defenses. From this, we can only conclude and split again, this time at the dots. The second and first
that SSRF is still not present enough in developers’ minds elements of the resulting array are reversed and concate-
and applications. nated. For example, http://www.example.org is split into
["http:","","www.example.org"]. The third element is
reversed to gro.elpmaxe and split ["gro", "elpmaxe"].
PHP: SSR by Accident? PHP is a language in which it
Then, the second and first elements are reversed and concate-
is easy to implement SSR functionality ‘by accident’. Many
nated: example.org.
functions that deal with local files can also request remote
To bypass the allowlist, an attacker can append
resources. Most prominently, file_get_contents, but even
?.example.org to an arbitrary scheme, port, and domain:
more obscure functions such as getimagesize can trigger
http://evil.com:22?example.org. Since the last step
requests.
only takes the first and second element in the reversed in-
The implementation types is_file and opendir – both
put, the function returns example.org, which is accepted by
PHP builtins – indicate that developers try to limit the SSR
the allowlist. Thus, an attacker can control the scheme, do-
sinks to their local features only. But both functions allow
main, and port of the SSR request, making the application
FTP requests. Takeaway: We propose that APIs should be de-
vulnerable to A1 (port scanning, network scanning) and A4
signed so that SSR functionality has to be explicitly requested.
Bridging Attacks.
Otherwise, developers introduce unexpected SSR features or
have to implement checks to stop SSRs, introducing risk and
1 // we simplified this snippet.
complexity. 2 $url = $_REQUEST["url"];
3 $requested_root_domain = get_root_domain($url);
4 $allowed_domains=["qwant.com", "wikimedia.org"];
GitHub as a Data Source We want to provide insights into
5 if (in_array($requested_root_domain, $allowed_domains)) {
using GitHub as a data source for security research and sur- 6 $image = $url;
veys to help future work. One interesting result of our manual 7 $image_src = request($image);
analysis is that many of our findings are in repositories that 8 header("Content-Type: image/png");
9 echo $image_src;
are not real applications but hacking tools, such as web shells 10 }
or programs that are vulnerable by design, e.g., for CTF com- 11 function get_root_domain($url){
petitions. It is important to disregard those when reasoning 12 $split_url = explode("/", $url);
about developer awareness since they do not reflect the status 13 $base_url = $split_url[2];
14 $base_url_main_split = explode(".",
quo of real applications and developer awareness. Researchers ,→ strrev($base_url));
must exclude these repositories when using GitHub as a data 15 $root_domain = strrev($base_url_main_split[1]) . "."
source. ,→ . strrev($base_url_main_split[0]);
We also encountered archived repositories as well as aban- 16 return $root_domain;
17 }
doned projects (19). Although this is not necessarily a sign
of vulnerable applications, we expect that unsupported ap-
plications do not represent the current state of attacks and Figure 8: Vulnerable code snippet of the metasearch engine.
defenses.

Case Study: Insufficient Domain Validation We encoun- Case Study: unsafe esc_url_raw We found a vulner-
tered a vulnerability in LibreX during our work4 that serves able WordPress plugin that depends on the broken sanitizer
as a suiting example for an insufficient attempt at protecting function esc_url_raw. Figure 9 shows its sanitization func-
against SSRF. The vulnerable code is shown in Figure 8. tion, which uses some user input as a source. The input
The code defines a domain allowlist in line 3. It then at- is passed through sanitize_text_field from WordPress,
tempts to check the user input from line 1 against the allow which has no special effect on URLs. The plugin only depends
4 https://github.com/hnhx/librex. We notified the developers of on esc_url_raw as a sanitizer for SSR abuses. As established
the problem. The repository was deleted from GitHub. in Section 3.4 esc_url_raw’s documentation promises that
its result is safe for HTTP requests, but this is not true since 6.2 PHPJoern
it is not performing any SSR validation. This underlines the
importance of clear and correct documentation. We have discussed PHPJoern’s shortcomings we identified be-
fore conducting our study (4.1.1). We, a posteriori, evaluated
if they would have impacted our results if we had used PHPJo-
1 // We simplified this snippet. ern. Since we mitigate false positives through our subsequent
2 function get_response( $url ) {
3 $result = wp_remote_get(esc_url_raw( $url )); manual review, we focus on the version mismatch.
4 return $result; To estimate the impact of errors due to version mismatches,
5 } we conducted an experiment: Using PHP’s syntax check fea-
ture, we measured the PHP 7.1 compatibility of our dataset.
Figure 9: Vulnerable sanitizer function of a WP plugin. We found 4.990 repositories that use modern PHPJoern-
incompatible features, i.e., they pass PHP 8.2’s syntax check
but not PHP 7.1’s. We manually cross-checked these with our
vulnerable flows and identified one vulnerable code path miss-
ing from PHPJoern’s CPG. Therefore, we would be unable to
find it if we based SURFER on PHPJoern instead of our new
6.1 Limitations CPG generator. The breaking feature used by the vulnerable
code is DNF Type Declarations.
While we aim at being complete, our approach contains limi-
tations:
Static Analysis Static analysis suffers from inherent limi- 6.3 Research Questions Answered
tations that can lead to missed data flows due to programming We will now answer our research questions from Section 2.3
patterns that are inherently difficult, if not impossible, to re- and summarize our learnings.
construct. These patterns are a well-known limitation of static How Popular are SSRs? 49.6 % of the repositories in our
analysis that is under active research [e.g. 23, 24], and may dataset contained at least one SSR sink. Since this number
lead to missed data flows. Static analysis is also vulnerable to is string-search-based, it is a rough approximation. Filtering
false positives; however, as we manually verify each of our de- results for only those with user input in the data flow to the
tected flows, we eliminate this risk reliably. Additionally, our SSR sink, we get 141. This shows that application developers
static analysis failed for very complex applications as CPG are using SSR sinks mostly with static targets.
creation can experience exponential growth, which may lead Current State of SSRF defenses (RQ1) From an aca-
to time-outs or memory exhaustion depending on the struc- demic and documentation standpoint, SSR abuses are well
ture of the application, limiting our data set (ref. Figure 6). documented. The literature provides enough sources on de-
Finally, we had to decide on one of the different and only fenses, and OWASP is providing cheat sheets. The common
partially compatible PHP major versions against which to framework Symfony implemented a safer HTTP client, and
implement our CPG creation. We decided on the, at the time, there are drop-in replacements for safer curl usage in PHP.
most recent PHP 8.2. This will inevitably lead to projects with WordPress attempts to provide a defense but is flawed and
legacy code that are only partially translatable into bytecode, vulnerable to DNS rebinding. Other frameworks supporting
with the files containing the legacy code left out. Using static SSR lack defense capabilities and do not discuss the risks of
analysis potentially reduces the overall amount of analyzed SSRs in their documentation.
and reported SSRF-related data points.
Web developers are not using readily available defenses
Missed Second Order Vulnerabilities We do not con- (RQ2) We discussed the availability of ready-to-be-used safer
sider second-order data flows when performing our manual HTTP clients that defend against certain SSR abuses in Sec-
analysis, as we filter out any flow without attacker control. tion 3.4. However, only a negligible amount of PHP applica-
Second-order vulnerabilities are notoriously challenging to tions on GitHub are using them.
detect [13] and, given the typical usage pattern of SSR, are of Applications are lacking custom defenses (RQ3) Our
minor relevance to our overarching research question. static analysis results, in combination with an in-depth manual
Focus on Common Sinks and Exploits We use popular analysis, showed that proper defense and mitigation attempts
HTTP sinks with protocol-agnostic exploitation patterns as are rare. No application is equipped with DNS validation,
our starting point for the static analysis. This excludes more which is essential for a safe and proper deny-list-based de-
exotic attacks involving technically complex and situational fense against SSR abuses. Only two applications properly
exploitations, e.g., leveraging open_dir via ftp. While those used allow listing.
exploits are technically feasible, they do not represent this State of SSR vulnerabilities in applications (RQ4) Con-
research’s common and focused exploit scenario and should sidering that open-source web application developers are not
be considered for future work. using existing SSR defenses and are not implementing proper
SSR defenses, we conclude that SSR awareness has not ar- PHP Static Analysis Previous work covered information
rived in the mainstream of application developers. flow and taint-style vulnerabilities in PHP applications.
Huang et al. [19] detected vulnerabilities via information
flow analysis in PHP applications and were the first to do
7 Related Work so. Jovanovic et al. [22] proposed a static taint-flow analysis
for PHP applications. Kassar et al. [23, 24] are working on
SSR Studies Previous academic work introduced dynamic pushing the coverage of PHP static analysis tools but over-
scanners to detect vulnerable SSR, Pellegrino et al. [32] pro- look SSRF sinks and vulnerabilities in their work. Backes
posed a black-box testing tool and scanned 68 services. We et al. [9] were the first to leverage code property graphs to
leverage the benefits of static code analysis, enabling us to analyze PHP applications. Alhuzali et al. [6] combined it with
cover all possible code paths without any input except the ap- dynamic analysis to generate exploits more precisely. Shezan
plication itself. This allows us to conduct a large-scale study et al. [35] augmented it with cross-language capabilities to
on 27,078 applications. Jabiyev et al. [21] proposed defenses search for GDPR violations. Our novel PHP code property
against SSRF and benchmarked them against known SSRF graph converter, written against the modern CPG standard
vulnerabilities. Musch et al. [28] studied the prevalence and [48], works on the CFG provided by PHP itself, making it
security implications of Server-Side-Browsers as SSR clients. more reliable than the one proposed by Backes et al. [9].
Sahin et al. [33] have conducted a CTF experiment to study
developers’ awareness of different web attack types. SSRF
was exploited the least, showing that it is still an unknown
vulnerability class. Previous work focused on detecting SSR 8 Conclusion
vulnerabilities or defending against them — we are the first
to evaluate existing defenses in the wild. SSRF is a complex and multifaceted vulnerability class, and
our survey of the current state of the art shows that developers
SSR Systematization Pellegrino et al. [32] introduced a must consider multiple attack vectors. However, the exper-
classification of SSRs along the axes of Flaw, Behavior, Con- iments in this paper reveal that developers do not properly
trol, and Target. Additionally, they presented some mitigation defend against SSRF:
techniques they encountered. i) Our analysis of popular PHP frameworks and SSR li-
In contrast, our systematization, presented in section 3, in- braries shows that even if SSR capabilities are offered, de-
cludes defenses as a first-class citizen. Consequently, attacks fenses in any form are commonly missing or defective. In
are explicitly mapped to suitable defenses. They are classified particular, dedicated defenses against SSRF are either broken
as full, partial, or allow-list only protections. Additionally, we (WordPress) or are simply not used by the vast majority of
systematized known defense evasion techniques and linked PHP applications on GitHub (Symfony, SafeCurl, etc.) – Only
them with our previous efforts. four applications are using existing safe countermeasures, as
Furthermore, our systematization distinguishes between shown by our usage study.
allow listing and deny list cases, which Pellegrino et al. did not ii) As dedicated defensive measures are not used, we
cover. However, this differentiation is essential to a complete investigated if homegrown countermeasures are implemented
understanding of SSR defenses. Some defenses are sufficient instead. For this purpose, we examined 27,078 software
in the allow list case, e.g., complete URL validation is an projects sourced from GitHub using our CPG-based tool
adequate host validation. At the same time, more technical S URFER and a subsequent rigorous manual analysis. Our
effort and knowledge are required in the deny list case, i.e., investigation into the resulting flows and deployed defenses
DNS Rebinding protection is needed. revealed that only two applications employ their own safe
Additionally, while our systematization contains the attacks allow-list defense. Furthermore, we did not find any secure
from Pellegrino et al., it presents a current and updated picture. deny-list defense since protection against DNS rebinding was
It includes recent developments; for example, it encompasses absent in all cases.
the new attack surface of browsers as HTTP clients [28].
Additionally, we split the Probe class into the more suited In a somber conclusion, our results show that, while being
Port scan and Network scan categories to better reflect the comparatively infrequent, SSRF is widespread in the applica-
impact of different defense techniques. ble subset of software projects: Almost all applications that
Therefore, our defense-encompassing systematization can might be susceptible to SSRF due to their application logic
be used by both researchers and practitioners. Security re- (i.e., they utilize at least one functionality that requires the
searchers can easily classify their potential findings alongside retrieval of external HTTP resources based on user input) are
existing mitigations using our systematization. Similarly, de- indeed vulnerable to such attacks. Hence, our results suggest
velopers can leverage it to check if their implementation is that developers either are unaware of SSRF’s dangers or are
vulnerable and are provided with better options. unwilling/unable to implement effective defenses.
Acknowledgments //docs.aws.amazon.com/AWSEC2/latest/UserGu
ide/ec2-instance-metadata.html, March 2022.
We are thankful for the valuable feedback of our anonymous
reviewers and shepherd. This work has received funding from [8] Arr0way. SSRF Cheat Sheet & Bypass Techniques .
the European Union’s Horizon 2020 research and innova- Online https://highon.coffee/blog/ssrf-cheat
tion programme under project TESTABLE, grant agreement -sheet/, 2021.
No 101019206. Additionally, it was funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research Founda- [9] Michael Backes, Konrad Rieck, Malte Skoruppa, Ben
tion) under Germany’s Excellence Strategy — EXC 2092 Stock, and Fabian Yamaguchi. Efficient and flexible
CASA — 390781972. discovery of php application vulnerabilities. In 2017
IEEE European Symposium on Security and Privacy
(EuroS&P), pages 334–349. IEEE, 2017.
Availability
[10] Jérémy Benoist. Server-Side Request Forgery (SSRF)
Our tooling is available as open-source software at https: protection plugin for HTTPlug. Online https://gith
//github.com/SSRF-vs-Developers. ub.com/j0k3r/httplug-ssrf-plugin, July 2022.

[11] Albert Danial. cloc: v1.81. Online https://github.c


Disclosure om/AlDanial/cloc, 2019.

We contacted the developers of affected repositories that were [12] DigitalOcean, LLC. How to access droplet metadata.
not deprecated or archived. We preferred the contact informa- Online https://docs.digitalocean.com/produc
tion from security policies to disclose the issues responsibly. ts/droplets/how-to/retrieve-droplet-metad
If no security policy was present, we filed issues asking for ata, March 2022.
their preferred way of disclosure or tried to contact the devel-
opers via email. [13] Benjamin Eriksson, Giancarlo Pellegrino, and Andrei
Sabelfeld. Black widow: Blackbox data-driven web
scanning. In 42nd IEEE Symposium on Security and
References Privacy, SP 2021, San Francisco, CA, USA, 24-27 May
2021, pages 1125–1142. IEEE, 2021. doi: 10.1109/SP
[1] Cve-2016-4029. Online https://www.cve.org/CVER
40001.2021.00022. URL https://doi.org/10.110
ecord?id=CVE-2016-4029, 2016. visited 2023-06-02.
9/SP40001.2021.00022.
[2] Cve-2021-21973. Online https://www.cve.org/CV
[14] fin1te. SafeCurl: SSRF Protection, and a "Capture the
ERecord?id=CVE-2021-21973, 2021. visited 2023-
Bitcoins". Online https://whitton.io/articles/
06-02.
safecurl-ssrf-protection-and-a-capture-the
[3] Cve-2021-26855. Online https://www.cve.org/CV -bitcoins/, May 2014.
ERecord?id=CVE-2021-26855, 2021.
[15] Andrew Gallant. ripgrep (rg). Online https://gith
[4] NVD - CVE-2023-43654. Online https://nvd.nist ub.com/BurntSushi/ripgrep, 2021.
.gov/vuln/detail/CVE-2023-43654, 2023.
[16] Google. Subscribe to someone’s Google Calendar -
[5] Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. Computer - Google Calendar Help. Online https:
Compilerbau, Teil 2, Compilerbau. Oldenbourg Wis- //support.google.com/calendar/answer/37100,
senschaftsverlag, 2016. 2023.

[6] Abeer Alhuzali, Rigel Gjomemo, Birhanu Eshete, and [17] Josh Grunzweig, Matthew Meltzer, Sean Koessel,
V.N. Venkatakrishnan. NAVEX: Precise and scalable Steven Adair, and Thomas Lancaster. Operation ex-
exploit generation for dynamic web applications. In change marauder: Active exploitation of multiple zero-
27th USENIX Security Symposium (USENIX Security day microsoft exchange vulnerabilities. Online https:
18), pages 377–392. USENIX Association, August 2018. //www.volexity.com/blog/2021/03/02/active-e
ISBN 978-1-939133-04-5. URL https://www.usen xploitation-of-microsoft-exchange-zero-day
ix.org/conference/usenixsecurity18/present -vulnerabilities/, 2021. visited 2023-06-02.
ation/alhuzali.
[18] Tarunkant Gupta. Blog on Gopherus Tool. Online
[7] Amazon Web Services, Inc. Instance metadata and user https://tarunkant.github.io/2018/08/14/201
data - amazon elastic compute cloud. Online https: 8-08-14-blog-on-gopherus/index.html, 2018.
[19] Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung [27] mozilla.org contributors. Content-Disposition - HTTP
Tsai, Der-Tsai Lee, and Sy-Yen Kuo. Securing web ap- | MDN. Online https://developer.mozilla.org/
plication code by static analysis and runtime protection. en-US/docs/Web/HTTP/Headers/Content-Dispo
In Proceedings of the 13th international conference on sition, 2023.
World Wide Web, WWW 2004, WWW ’04, pages 40–
[28] Marius Musch, Robin Kirchner, Max Boll, and Martin
52. Association for Computing Machinery, 2004. doi:
Johns. Server-Side Browsers: Exploring the Web’s Hid-
10.1145/988672.988679.
den Attack Surface. In Proc. of the 17th ACM Asia
[20] IncludeSec team. Introducing: SafeURL – A set of Conference on Computer and Communications Security
SSRF Protection Libraries. Online https://blog.i (AsiaCCS’22), May 2022.
ncludesecurity.com/2016/08/introducing-saf
[29] Ivan Novikov. SSRF bible. Cheatsheet. Online https:
eurl-a-set-of-ssrf-protection-libraries/,
//cheatsheetseries.owasp.org/assets/Server
2016.
_Side_Request_Forgery_Prevention_Cheat_She
[21] Bahruz Jabiyev, Omid Mirzaei, Amin Kharraz, and En- et_SSRF_Bible.pdf, Jan 2017.
gin Kirda. Preventing server-side request forgery at- [30] OWASP Contributors. Server Side Request Forgery
tacks. In Proceedings of the 36th Annual ACM Sym- Prevention - OWASP Cheat Sheet Series. Online https:
posium on Applied Computing, SAC ’21, pages 1626– //cheatsheetseries.owasp.org/cheatsheets/S
1635. Association for Computing Machinery, 2021. doi: erver_Side_Request_Forgery_Prevention_Chea
10.1145/3412841.3442036. t_Sheet.html, 2022.
[22] Nenad Jovanovic, Christopher Kruegel, and Engin Kirda. [31] OWASP Top 10 team. A10:2021 – server-side request
Static analysis for detecting taint-style vulnerabilities in forgery (SSRF). Online https://owasp.org/Top10/
web applications. Journal of Computer Security, 18(5): A10_2021-Server-Side_Request_Forgery_(SSRF
861–907, August 2010. doi: http://dx.doi.org/10.3233/J )/, September 2021.
CS-2009-0385.
[32] Giancarlo Pellegrino, Onur Catakoglu, Davide
[23] Feras Al Kassar, Giulia Clerici, Luca Compagna, Davide Balzarotti, and Christian Rossow. Uses and abuses of
Balzarotti, and Fabian Yamaguchi. Testability tarpits: server-side requests. In Research in Attacks, Intrusions,
the impact of code patterns on the security testing of web and Defenses - 18th International Symposium, RAID
applications. In 29th Annual Network and Distributed 2016, January 2016.
System Security Symposium, NDSS 2022, San Diego,
California, USA, April 24-28, 2022. The Internet Society, [33] Merve Sahin, Tolga Ünlü, Cédric Hébert, Lynsay A.
2022. URL https://www.ndss-symposium.org/n Shepherd, Natalie Coull, and Colin Mc Lean. Measur-
dss-paper/auto-draft-206/. ing Developers’ Web Security Awareness from Attack
and Defense Perspectives. In 2022 IEEE Security and
[24] Feras Al Kassar, Luca Compagna, and Davide Balzarotti. Privacy Workshops (SPW), pages 31–43, 2022. doi:
WHIP: improving static vulnerability detection in 10.1109/SPW54247.2022.9833858.
web application by forcing tools to collaborate. In
Joseph A. Calandrino and Carmela Troncoso, editors, [34] sbani. New Option ‘pin_base_uri‘ to Prevent Potential
32nd USENIX Security Symposium, USENIX Security SSRF · Issue #2859 · guzzle/guzzle. Online https://
2023, Anaheim, CA, USA, August 9-11, 2023. USENIX github.com/guzzle/guzzle/issues/2859, 2021.
Association, 2023. URL https://www.usenix.org [35] Faysal Hossain Shezan, Zihao Su, Mingqing Kang,
/conference/usenixsecurity23/presentation/ Nicholas Phair, Patrick William Thomas, Michelangelo
al-kassar. van Dam, Yinzhi Cao, and Yuan Tian. CHKPLUG:
Checking GDPR Compliance of WordPress Plugins via
[25] Vickie Li. Bypassing SSRF Protection. Online https:
Cross-language Code Property Graph. In NDSS, 2023.
//vickieli.medium.com/bypassing-ssrf-prote
ction-e111ae70727b, 2019. visited 2023-06-02. [36] Giada Stivala and Giancarlo Pellegrino. Deceptive pre-
views: A study of the link preview trustworthiness in so-
[26] Colm MacCarthaigh. Add defense in depth against open
cial platforms. In 27th Annual Network and Distributed
firewalls, reverse proxies, and SSRF vulnerabilities with
System Security symposium, February 2020. URL
enhancements to the EC2 Instance Metadata Service.
https://publications.cispa.saarland/3029/.
Online https://aws.amazon.com/de/blogs/secu
rity/defense-in-depth-open-firewalls-rever [37] Symfony. HTTP Client (Symfony Docs). Online https:
se-proxies-ssrf-vulnerabilities-ec2-insta //symfony.com/doc/current/http_client.html,
nce-metadata-service/, 2019. 2023.
[38] Symfony. New in Symfony 5.1: Server-side request Appendix
forgery protection (Symfony Blog). Online https:
//symfony.com/blog/new-in-symfony-5-1-ser 8.1 List of Supported PHP SSR sinks
ver-side-request-forgery-protection, 2023.
We compiled PHP functions that can trigger HTTP requests in
[39] Laurence Tennant. Mitigating SSRF in 2023. Online default configurations. We included the popular ‘curl’ exten-
https://blog.includesecurity.com/2023/03/m sion. We amended the list with request sinks from WordPress,
itigating-ssrf-in-2023/, 2023. since it is the most-used PHP framework. Please note that
get_headers performs a GET and not a HEAD request.
[40] The MITRE Corporation. 2021 CWE top 25 most dan- We chose not to include sinks in this work that re-
gerous software weaknesses. Online https://cwe.mi quire a configuration change to trigger an SSR, e.g., if
tre.org/data/definitions/1387.html, 2021. allow_url_include is set to true, the include and require
[41] The PHP Group. PHP: Deprecated Features - Manual. functions of PHP are able to trigger network requests. We
Online https://www.php.net/manual/en/migrat include file_get_contents due to a similar reasoning. It
ion80.deprecated.php#migration80.deprecate requires allow_url_fopen to be set to true – which is the
d.libxml, 2020. default.
However, since our methodology is general, the list can be
[42] The PHP Group. PHP: cURL. Online https://www. easily modified to broaden the scope of sinks.
php.net/manual/en/book.curl.php, 2024.
• file_get_contents
[43] The PHP HTTP group. HTTPlug. Online https:
//httplug.io/, 2023. • curl_init

[44] Cheng-Da Tsai. A New Era of SSRF - Exploiting URL • curl_set_opt


Parser in Trending Programming Languages! Online
• getimagesize
https://www.blackhat.com/docs/us-17/thursd
ay/us-17-Tsai-A-New-Era-Of-SSRF-Exploitin • get_headers
g-URL-Parser-In-Trending-Programming-Langu
ages.pdf, 2017. • wp_http::get

[45] WordPress. esc_url_raw() | Function, 2016. URL http • requests::get


s://developer.wordpress.org/reference/func
• wp_remote_request
tions/esc_url_raw/.
• wp_remote_get
[46] WordPress. wp_safe_remote_get() | Function. Online
https://developer.wordpress.org/reference/ • wp_remote_post
functions/wp_safe_remote_get/, 2017.
• wp_remote_head
[47] Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad
Rieck. Modeling and discovering vulnerabilities with • wp_safe_remote_request
code property graphs. In Proceedings of the 2014 IEEE
• wp_safe_remote_get
Symposium on Security and Privacy, pages 590–604.
IEEE Computer Society, 2014. doi: 10.1109/SP.2014. • wp_safe_remote_post
44.
• wp_safe_remote_head
[48] Fabian Yamaguchi, Markus Lottmann, Niko Schmidt,
Michael Pollmeier, Suchakra Sharma, and Claudiu-Vlad
8.2 Findings
Ursache. Github code property graph. Online https:
//github.com/ShiftLeftSecurity/codepropert Table 7 lists the apps we identified as vulnerable. We con-
ygraph, 2023. tacted the developers if the repository was not archived or
explicitly deprecated.
[49] Amar Zlojic. Server Side Request Forgery (SSRF)
Attacks & How to Prevent Them, 04 2022. https:
//brightsec.com/blog/ssrf-server-side-req
uest-forgery/.
Name Stars Exploitability Note
10up/safe-svg 193 #
H
A5hleyRich/delightful-downloads 27 #
H
akirk/friends 68 #
H
amzik/officemanage 27 #
H
Arsenal21/all-in-one-wordpress-security 40 #
H †
bigbignerd/WxCrawler 31 #
G
blindsidenetworks/wordpress-plugin_bigbluebutton 26 #
H
captn3m0/jqaas 32 #
G
chyrp/chyrp 203
Codiad/Codiad 2823 #
H †
codingeverybody/makewebapp 33 G
#
csev/dj4e 90 #
H
Cvolton/GMDprivateServer 313
cw1997/Tieba-Posting-Frequency 31 G
#
d3y4n/instagraph 326 #
G
dave-p/TVHadmin 26 #
H
diegolamonica/EUCookieLaw 50 #
G
DSJAS/DSJAS 42 #
H
factmaven/xml-to-json 48 #
H
Feathur/Feathur 72 #
H
fingerQin/Yaf-Server-Admin 51 #
G
Frecuencio/sqlbuddy-php7 40 #
G
friend-nicen/nicen-localize-image 63 #
H
GeSHi/geshi-1.0 162 #
H
greenido/backbone-bira 26
hnhx/librex 652 G
# ✗
iandevlin/resimagecrop 41 #
H
inclusive-design/AChecker 68 #
H †
jadijadi/techninjatheme 34 #
H
kodejuice/localGoogoo 41 #
H
lfiore/upld 42 #
H
Licoy/wordpress-theme-puock 1740 #
H
LyLme/lylme_spage 267 #
G
marekrei/encode-explorer 221 #
G
markjaquith/WordPress-Plugin-Readme-Parser 42 #
H
mojeda/QuickGallery 42 #
H
mokecc/VideoUrlParser 30 #
H
MonstaApps/Monsta-FTP 129 #
G †
mpeshev/DX-Plugin-Base 115 #
G
mwt/apfollow 34 #
H
nangge/webchat-robot 41 #
H
naofode/naofo.de 100 #
H
nbhr/php-reverse-proxy 81 #
G
nk932714/yify-movies-php 26 #
H
norbusan/piwigopress 426 #
H
onigetoc/m3u8-PHP-Parser 55 #
G
OpenGamePanel/OGP-Agent-Linux 86 #
G
PhiRhythmus/Tanx 134 #
H
photonstorm/AS3toTypeScript 70 #
G
PHPAuth/PHPAuth 872 #
H
phucvo0709/Clone-Google-Search-Engine 33 #
H
plidezus/aimozhen 39 #
H
qakcn/qchan 207 #
H
quicksketch/timezonepicker 53 #
G
rmorse/Open-Manager 61 #
H †
s3131212/allendisk 37 #
H †
segler-alex/radiobrowser-api 71 #
H †
shiflett/unveil 28 #
H
Simsso/Online-Tools 54 #
G
sixty-nine/PHP_Word_Cloud 39 #
G †
splitbrain/php-epub-meta 58 #
G
su18/Stitch 161 #
G
uksb/vqgen 48 #
G
vedees/wcms 250 #
G
vito/chyrp 232 #
G †
WolfieZero/Markdown-Viewer-PHP 50 #
G
wp-sync-db/wp-sync-db-media-files 520 #
H
wujunze/onlineDisk_search 26
xb2016/kratos-pjax 949 #
H †
Average stars 190,55
Median stars 54

Table 7: Applications we identified as vulnerable. #


G: Repo contained a trivially exploitable flow. #
H: Repo contained a non-trivial
exploitable flow. : Repo contained both trivial and non-trivial flows. †: Deprecated or archived, ✗: Repo was deleted.
USENIX Security ’24 Artifact Appendix: SSRF vs. Developers: A Study
of SSRF-Defenses in PHP Applications

Malte Wessels* † , Simon Koch* † , Giancarlo Pellegrino‡ , Martin Johns†


† Technische Universität Braunschweig
‡ CISPA Helmholtz Center for Information Security

{malte.wessels, simon.koch, m.johns}@tu-braunschweig.de, pellegrino@cispa.de

A Artifact Appendix A.2.4 Software dependencies

A.1 Abstract A recent Linux system with docker and git installed.

A new PHP code property graph (CPG) generator. It is based


on PHP bytecode lifted directly from the PHP interpreter. A.2.5 Benchmarks
Additionally, we supply a static analysis pipeline (slicing
None.
and string reconstruction) to find SSRF candidates. We are
applying for the functional badge.
A.3 Set-Up
A.2 Description & Requirements We provide several tools which are automatically built by the
A.2.1 Security, privacy, and ethical concerns docker containers.

Security and Ethics Concerns: We use a publicly available


PHP-src We patched the PHP interpreter to output more data
PHP web shell with an SSRF candidate flow to demonstrate
in its bytecode debug output.
the functionality. Since a web shell is vulnerable by design,
it should not be hosted. Privacy: We pull dependencies from
CPG A code property graph generator based on PHP byte-
GitHub, Docker, and via sbt. This requires connections to
code.
these servers.
Slicer A slicing utility for this CPG.
A.2.2 How to access
Plotter A plotting utility for slices.
An overview is available at: https://github.com/
SSRF-vs-Developers.
S URFER Our tool to identify SSRF candidates in these
Version-fixed docker files for our code property
CPGs.
graph generation and experiment runner are avail-
able at: https://github.com/SSRF-vs-Developers/
CpgGeneration/tree/artifact. Optional: Install Joern
S URFER, an SSRF candidate detection tool to run on the
generated PHP CPG, is available at: https://github.com/ Install Joern2 via its README or docker. We can confirm that
SSRF-vs-Developers/surfer/tree/artifact. Joern version 2.0.223 is suitable, but we recommend trying
The GitHub organization contains an overview, which we the latest version first. Pulling the docker image might require
used as an entry point for this artifact evaluation1 . a GitHub login3 .

A.2.3 Hardware dependencies # pull docker image


docker pull ghcr.io/joernio/joern:nightly
None, for the sake of demonstration we use small examples.
2 https://github.com/joernio/joern/
* Bothauthors contributed equally to this research. 3 https://docs.github.com/en/packages/
1 https://github.com/SSRF-vs-Developers/.github/tree/ working-with-a-github-packages-registry/
ed093a0443fefd4a8a2d8c134df813e80a6dfa5a/profile working-with-the-container-registry
A.3.1 Installation A.4.2 Experiments

Our toolchains are docker-based. We will provide recent We will use one of the repositories from our dataset to show
toolchain versions in the GitHub organization at https: the functionality. For the sake of demonstration, instead of
//github.com/PHP-CPG. However, for the purpose of this using a productive application as an analysis subject, we use
artifact evaluation, we created dockerfiles that pin the versions. a publicly available PHP reverse shell, which we found in our
To build these, clone their repository: https://github.com/ dataset and classified as a ‘hacking tool’4 .
SSRF-vs-Developers/CpgGeneration. We provide scripts E1 addresses C1, and E2 addresses C2, respectively.
that create (build and test) these docker containers. Change (E1): [∼ 1 compute-minute + < 100 MB disk]: Creating a
the directory to their folders and run them in this order: CPG from a project: We fetch the source code and create
a CPG. Additionally, this runs S URFER on the created
1. CPG/resources/docker/PHP-StringPatched/create.sh CPG.
Preparation: Install docker, wget, and a tool to unpack
2. CPG/resources/docker/multilayer-php-cpg/create.sh zip files, e.g., unzip.
How to: First, cd into the ‘in’ folder: ‘surfer/ba-
3. ExperimentRunner/template/create.sh sictestfiles/in’. Then download and extract https:
//github.com/ivan-sincek/php-reverse-shell/
Each dockerfile will build the toolchain, including pulling archive/refs/tags/v2.6.zip.
dependencies and running test cases. We, finally, build our Execution: Rerun the command from A.3.2:
tool S URFER by running the following commands: ./surfer_docker_run.sh ./basictestfiles/in/
./basictestfiles/cpg ./basictestfiles/out
• Clone https://github.com/SSRF-vs-Developers/ Results: Cd into the ‘basictestfiles/cpg‘ folder and ob-
surfer serve that a .cpg file was created. Optionally: Load
it into joern5 via ‘joern file.cpg’ and run queries, e.g.,
• cd resources/docker ‘cpg.call.size’. If using dockererized joern:
docker run --rm -it -v / tmp :/ tmp -v
• ./create.sh $( pwd ) :/ app : rw -w / app -t ghcr . io
/ joernio / joern : nightly joern / app
/ file . cpg
A.3.2 Basic Test
joern > cpg . call . size
The repositories ships with the most basic SSRF (E2): [S URFER] [1 min]: The previous experiment also ran
example to test the pipeline. Run this command: S URFER automatically. Navigate to the ‘out’ folder and
./surfer_docker_run.sh ./basictestfiles/in/ confirm that a JSON file with the project’s name was
./basictestfiles/cpg ./basictestfiles/out created. It should have 2 SSRF candidate flows under
This creates a cpg in the cpg folder. Additionally, a testpro- the ‘candidates’ key. Each candidate contains a reversed
ject.json file is created in the out folder. This JSON contains string. Additionally, .dot files are created to visualize the
one candidate: slice through our CPG.

"reversedString": ["<G:_GET>[x]"], A.5 Version


"sink": ["/in/testproject/test.php", "1"],
"sinkName": "file_get_contents", Based on the LaTeX template for Artifact Evaluation
"sources": [["/in/testproject/test.php", "1"]] V20231005. Submission, reviewing and badging methodol-
ogy followed for the evaluation of this artifact can be found at
https://secartifacts.github.io/usenixsec2024/.
Additionally, a dot file is generated, which visualizes our
slice. It can be rendered with utilities like Graphviz.

A.4 Evaluation workflow


A.4.1 Major Claims
(C1): We provide a bytecode PHP code property graph
(CPG) generator. 4 https://github.com/ivan-sincek/php-reverse-shell/
(C2): S URFER can find SSRF candidates in these CPGs. 5 https://github.com/joernio/joern/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy