Programming Openresty PDF
Programming Openresty PDF
of Contents
Introduction 1.1
Automated Testing 1.2
Introduction 1.2.1
Test::Nginx 1.2.2
Test Suite Layout 1.2.3
Test File Layout 1.2.4
Running Tests 1.2.5
Preparing Tests 1.2.6
Testing Erroneous Cases 1.2.7
Test Modes 1.2.8
Advanced Topics 1.2.9
1
Introduction
Programming OpenResty
This is an official guide on OpenResty programming written by the OpenResty creator. This
book is still in preparation. Please check back often for updates.
The entire Programming OpenResty book, written by Yichun Zhang, is available here. All
content is licensed under the Creative Commons Attribution Non Commercial Share Alike
3.0 license. You can download or browse the rendered book in various different formats on
the GitBook website below.
https://www.gitbook.com/book/openresty/programming-openresty/
The latest source of the book can be found in the following GitHub repository:
https://github.com/openresty/programming-openresty
2
Automated Testing
Automated Testing
Automated testing plays a critical role in software development and maintainance.
OpenResty provides a data-driven test scaffold for writing declarative test cases for NGINX
C modules, Lua libraries, and even OpenResty applications. The test cases are written in a
specification-like format, which is both intuitive to read and write for humans and also easy
to handle for machines. The data-driven approach makes it easy to run the same tests in
wildly different ways that can help expose issues in different scenarios or with different kinds
of external tools.
This chapter introduces the Test::Nginx test scaffold that has been widely used to organize
test suites for almost all the OpenResty components, including the ngx_http_lua module,
most of the lua-resty-* Lua libraries, as well as full-blown business applications like
CloudFlare’s Lua CDN and Lua SSL.
3
Introduction
Introduction
OpenResty itself has been relying on automated testing to remain high quality over the
years. As OpenResty core developers, we embrace the test driven development (TDD)
process all the time. An excellent result of our TDD practices over the years is a huge set of
test suites for all the OpenResty components. These test suites are so large as a whole, so
it is impractical to run all the tests thoroughly on a single machine. A relatively large test
cluster is often run on Amazon EC2 to run all these tests in all existing test modes. Lying at
the heart of these test suites is usually the Test::Nginx test scaffold module developed by
the OpenResty team.
The Test::Nginx scaffold provides a generic simple specification language for expressing
and organizing test cases in an intuitive way. It also provides various powerful testing modes
or "engines" to run the tests in various different ways in the hope of exposing bugs in
different settings. It is also supported to extend the test specification language to add custom
abstractions for advanced testing needs, usually found in application-level regression
testing.
Conceptual Roadmap
Overview
4
Test::Nginx
Test::Nginx
Test::Nginx is a test framework that drives test cases written for any code running atop
NGINX, and also, naturally, the NGINX core itself. It is written in Perl because of the rich
testing facilities and toolchain already accumulated in the Perl world for years. Fortunately,
the user does not really need to know Perl for writing test cases atop this scaffold since
Test::Nginx provides a very simple notation to present the test cases in a specification-like
format.
The simple test specification format, or language, used in Test::Nginx is just a dialect of
the more general testing language provided by the Test::Base testing module in the Perl
world. In fact, Test::Nginx is just a subclass of Test::Base in the sense of object-oriented
programming. This means that all the features offered by Test::Base is available in
Test::Nginx and Test::Nginx just provides handy primitives and notations that simplify
testing in the NGINX and OpenResty context. The core idea of Test::Base is so useful that
we have been using testing scaffolds based on Test::Base in many different projects even
including Haskell programs and Linux kernel modules. Test::Nginx is such an example we
created for the NGINX and OpenResty world. Detailed discussion of the Test::Base
framework itself is beyond the scope of this book, but we will introduce the important
features of Test::Base that are inherited by Test::Nginx in the later sections.
Test::Nginx is distributed via CPAN, the Comprehensive Perl Archive Network, just like
most of the other Perl libraries. If you already have perl installed in your system (many
Linux distributions ship with perl by default), then you can install Test::Nginx with the
following simple command:
cpan Test::Nginx
For the first time that the cpan utility is run, you may be prompted to configure the cpan
utility to fit your requirements. If you are unsure about those options, just choose the
automatic configuration option (if available) or just accept all the default settings.
Test::Nginx provides several different testing classes for different user requirements. The
most frequently used one is Test::Nginx::Socket . The rest of this chapter will focus on this
testing class and its subclasses. We will use the names Test::Nginx and
Test::Nginx::Socket interchangeably from now on to mean the Test::Nginx::Socket test
5
Test::Nginx
6
Test Suite Layout
By convention, such projects have a t/ directory at the root of their source tree where test
files reside in. Each test file contains test cases that are closely related in some way and has
the file extension .t to easily identify themselves as "test files". Below is the directory tree
structure of a real-world test suite inside the headers-more-nginx-module project:
└── t
├── bug.t
├── builtin.t
├── eval.t
├── input-conn.t
├── input-cookie.t
├── input-ua.t
├── input.t
├── phase.t
├── sanity.t
├── subrequest.t
├── unused.t
└── vars.t
When you have many test files, you can also group them further with sub-directories under
t/ . For example, in the lua-nginx-module project, we have sub-directores like 023-
In essence, each .t file is a Perl script file runnable by either perl or Perl’s universal test
harness tool named prove. We usually use the prove command-line utility to run such .t
files to obtain test results. Although .t files are Perl scripts per se, they usually do not
have much Perl code at all. Instead, all of the test cases are declared as cleanly formatted
"data" in these .t files.
7
Test Suite Layout
The test suite layout convention we use here are also used by the Perl
community for many years. Because Test::Nginx is written in Perl and reuses
Note
Perl’s testing toolchain, it makes sense for us to simply follow that convention in
the NGINX and OpenResty world as well.
8
Test File Layout
__DATA__
The perl interpreter or the prove utility stop interpreting the file content as Perl source
code until they see this special line. Everything after this line is treated as data in plain text
that is reachable by the Perl code above this line. The most interesting part of each .t test
file is the stuff after this line, i.e., the data part.
The first line is just loading the Perl module (or class), Test::Nginx::Socket and passing the
option 'no_plan' to it to disable test plans (we will talk more about test plans in later
chapters and we do not bother worrying about it here). Test::Nginx::Socket is one of the
most popular class in the Test::Nginx test framework. The second line just calls the
run_tests Perl function imported automatically from the Test::Nginx::Socket module to
run all the test cases defined in the data part of the test file (i.e., the things coming after the
__DATA__ line).
9
Test File Layout
There are, however, more complicated prologue parts in many real-world test suites. Such
prologues usually define some special environment variables or Perl variables that can be
shared and referenced in the test cases defined in the "data part", or just call some other
Perl functions imported by the Test::Nginx::Socket module to customize the testing
configurations and behaviors for the current test file. We will return to such fancier prologues
in later sections. They can be very helpful in some cases.
Perl allows function calls to omit the parentheses if the context is unambiguous.
So we may see Perl function calls without parentheses in real-world test files'
Note
prologue part, like run_tests; . We may use such forms in examples presented
in later sections because they are more compact.
The test case specification in the data part is composed by a series of test blocks. Each test
block usually corresponds to a single test case, which has a title, an optional description,
and a series of data sections. The structure of a test block is described by the following
template.
=== title
optional description
goes here...
--- section1
value1 goes
here
--- section2
value2 is
here
--- section3
value3
Block Titles
As we can see, each test block starts with a title line prefixed by three equal sign ( === ). It is
important to avoid any leading spaces at the beginning of the line. The title is mandatory and
is important to describe the intention of the current test case in the most concise form, and
10
Test File Layout
also to identify the test block in the test report when test failures happen. By convention we
put a TEST N: prefix in this title, for instance, TEST 3: test the simplest form . Don’t worry
about maintaining the test ordinal numbers in these titles yourself, we will introduce a
command-line utility called reindex in a later section that can automatically update the
ordinal numbers in the block titles for you.
Block Descriptions
Each test block can carry an optional description right after the block title line. This
description can span multiple lines if needed. It is a more detailed description of the intention
of the test block than the block title and may also give some background information about
the current test. Many test cases just omit this part for convenience.
Data Sections
Every test block carries one or more data sections right after the block description (if any).
Data sections always have a name and a value, which specify any input data fields and the
expected output data fields.
The name of a data section is the word after the line prefix --- . Spaces are allowed though
not syntactically required after --- . We usually use a single space between the prefix and
the section name for aesthetic considerations and we hope that you follow this convention
as well. The section names usually contain just alphanumeric letters and underscore
characters.
Section values are specified in two forms. One is all the lines after the section name line,
before the next section or the next block. The other form is more concise and specifies the
value directly on the same line as the section name, but right after the first colon character
( : ). The latter form requires that the value contains no line-breaks. Any spaces around the
colon are always discarded and never count as a part of the section value; furthermore, the
trailing line-break character in the one-line form does not count either.
If no visible values come after the section name in either form, then the section takes an
empty string value, which is still a defined value, however. On the other hand, omitting the
section name (and value) altogether makes that section undefined.
Test::Nginx offers various pre-defined data section names that can be used in the test
blocks for different purposes. Some data sections are for specifying input data, some are for
expected output, and some for controlling whether the current test block should be run at all.
11
Test File Layout
Here we have two input data sections, config and request , for specifying a custom
NGINX configuration snippet in the default server {} and the HTTP request sent by the
test scaffold to the test NGINX server, respectively. In addition, we have one output data
section, response_body , for specifying the expected response body output by the test
NGINX server. If the actual response body data is different from what we specify under the
response_body section, this test case fails. We have another output data section,
error_code , which specifies its value on the same line of the section name. We see that a
colon character is used to separate the section name and values. Obviously, the
error_code section specifies the expected HTTP response status code, which is 200.
Empty lines around data sections are always discarded by Test::Nginx::Socket . Thus the
test block above can be rewritten as below without changing its meaning.
--- config
location = /t {
echo "hello, world!";
}
--- request
GET /t
--- response_body
hello, world!
Some users prefer this style for aesthetic reasons. We are free to choose whatever form you
like.
12
Test File Layout
There are also some special data sections that specify neither input nor output. They are just
used to control how test blocks are run. For example, the ONLY section makes only the
current test block in the current test file run and all the other test blocks are skipped. This is
extremely useful for running an individual test block in any given file, which is a common
requirement while debugging a particular test failure. Also, the special SKIP section can
skip running the containing test block unconditionally, handy for preparing test cases for
future features without introducing any expected test failures. We will visit more such "control
sections" in later sections.
We shall see, in a later section, that the user can define her own data sections or extending
existing ones by writing a little bit of custom Perl code to satisfy her more complicated
testing requirements.
Section Filters
Data sections can take one or more filters. Filters are handy when you want to adjust or
convert the section values in certain ways.
Syntactically, filters are specified right after the section name with at least one space
character as the separator. Multiple filters are also separated by spaces and are applied in
the order they are written.
Test::Nginx::Socket provides many filters for your convenience. Consider the following
If we want to place the section value, 200, in a separate line, like below,
--- error_code
200
then the section value would contain a trailing new line, which leads to a test failure. This is
because the one-line form always excludes the trailing new-line character while the multi-line
form always includes one. To explicitly exclude the trailing new-line in the multi-line form, we
can employ the chomp filter, as in
Now it has exactly the same semantics as the previous one-line form.
13
Test File Layout
Some filters have more dramatic effect on the section values. For instance, the eval filter
evaluates the section value as arbitrary Perl code, and the Perl value resulted from the
execution will be used as the final section value. The following section demonstrates using
the eval filter to produce 4096 a’s:
The original value of the response_body section above is a Perl expression where the x
symbol is a Perl operator is used to construct a string that repeats the string specified as the
left-hand-side N times where N is specified by the right-hand-side. The resulting 4096-byte
Perl string after evaluating this expression dictated by the eval filter will be used as the
final section value for comparison with the actual response body data. It is obvious that use
of the eval filter and a Perl expression here is much more readable and manageable than
directly pasting that 4096-byte string in the test block.
As with data sections, the user can also define her own filters, as we shall see in a later
section.
A Complete Example
We can conclude this section by a complete test file example given below, with both the
prologue part and the data part.
run_tests();
__DATA__
We will see how to actually run such test files in the next section.
14
Test File Layout
The test file layout described in this section is exactly the same as the test files
based on other test frameworks derived from Test::Base , the superclass of
Test::Nginx::Socket , except those specialized test sections and specialized
Note
Perl functions defined only in Test::Nginx::Socket . All the Test::Base
derivatives share the same basic layout and syntax. They proudly inherit the
same veins of blood.
15
Running Tests
Running Tests
Like most Perl-based testing frameworks, Test:Nginx relies on Perl’s prove command-line
utility to run the test files. The prove utility is usually shipped with the standard perl
distribution so we should already have it when we have perl installed.
Test::Nginx always invokes a real NGINX server and a real socket client to run the tests. It
automatically uses the nginx program found in the system environment PATH . It is your
responsibility to specify the right nginx in your PATH environment for the test suite. Usually
we just specify the path of the nginx program inside the OpenResty installation tree. For
example,
export PATH=/usr/local/openresty/nginx/sbin:$PATH
You can always use the which command to verify if the PATH environment is indeed set
properly:
$ which nginx
/usr/local/openresty/nginx/sbin/nginx
For convenience, we usually wrap such environment settings in a custom shell script so that
we do not risk polluting the system-wide or account-wide environment settings nor take on
the burden of manually setting the environments manually for every shell session. For
example, I usually have a local bash script named go in each project I work on. A typical
go script might look like below
#!/usr/bin/env bash
export PATH=/usr/local/openresty/nginx/sbin:$PATH
Then we can use this ./go script to substitute the prove utility in any of the subsequent
commands involving prove .
Because Test::Nginx makes heavy use of environment variables for the callers to fine tune
the testing behaviors (as we shall see in later sections), such shell wrapper scripts also
make it easy to manage all these environment variable settings and hard to get things
16
Running Tests
wrong.
Please do not confuse the name of this bash script with Google’s Go
Note
programming language. It has nothing to do with the Go language in any way.
prove t/foo.t
Here inside t/foo.t we employ the simple test file example presented in the previous
section. We repeat the content below for the reader’s convenience.
t/foo.t
run_tests();
__DATA__
It is worth mentioning that we could run the following command instead if we have a custom
wrapper script called ./go for prove (as mentioned earlier in this section):
./go foo.t
17
Running Tests
t/foo.t .. ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.02 usr 0.01 sys + 0.08
cusr 0.03 csys = 0.14 CPU)
Result: PASS
This is a very concise summary. The first line tells you all tests were passed while the
second line gives you a summary of the number of test files (1 in this case), the number of
tests (2 in this case), and the wallclock and CPU times used to run all the tests.
It is interesting to see that we have only one test block in the sample test file but in the test
summary output by prove we see that the number of tests are 2. Why the difference? We
can easily find it out by asking prove to generate a detailed test report for all the individual
tests. This is achieved by passing the -v option (meaning "verbose") to the prove
command we used earlier:
prove -v t/foo.t
Now the output shows all the individual tests performed in that test file:
t/foo.t ..
ok 1 - TEST 1: hello, world - status code ok
ok 2 - TEST 1: hello, world - response_body - response is
expected (req 0)
1..2
ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.01 usr 0.01 sys + 0.07
cusr 0.03 csys = 0.12 CPU)
Result: PASS
Obviously, the first test is doing the status code check, which is dictated by the error_code
data section in the test block, and the second test is doing the response body check,
required by the response_body section. Now the mystery is solved.
It is worth mentioning that the --- error_code: 200 section is automatically assumed when
no error_code section is explicitly provided in the test block. So our test block above can be
simplified by removing the --- error_code: 200 line without affecting the number of tests.
18
Running Tests
This is because that checking 200 response status code is so common that Test::Nginx
makes it the default. If you expect a different status code, like 500, then just add an explicit
error_code section.
From this example, we can see that one test block can contain multiple tests and the
number of tests for any given test block can be determined or predicted by looking at the
data sections performing output checks. This is important when we provide a "test plan"
ourselves to the test file where a "test plan" is the exact number of tests we expect the
current test file to run. If a different number of tests than the plan were actually run, then the
test result would be considered malicious even when all the tests are passed successfully.
Thus, a test plan adds a strong constraint on the total number of tests expected to be run.
For our t/foo.t file here, however, we intentionally avoid providing any test plans by
passing the 'no_plan' argument to the use statement that loads the Test::Nginx::Socket
module. We will revisit the "test plan" feature and explain how to provide one in a later
section.
If you want to run all the test files directly under the t/ directory, then using a shell wildcard
can be handy:
prove -v t/*.t
In the case that you have sub-directories under t/ , you can specify the -r option to ask
prove to recursively traverse the while directory tree rooted at t/ to find test files:
prove -r t/
This command is also the standard way to run the whole test suite of a project.
data section ONLY to that test block you want to run individually and prove will skip all the
other test blocks while running that test file. For example,
19
Running Tests
Now prove won’t run any other test blocks (if any) in the same test file.
This is very handy while debugging a particular test block. You can focus on one test case at
a time without worrying about other unrelated test cases stepping in your way.
When using the Vim editor, we can quickly insert a --- ONLY line to the test block we are
viewing in the vim file buffer, and then type :!prove % in the command mode of vim without
leaving the editor window. This works because vim automatically expands the special %
placeholder with the path of the current active file being edited. This workflow is great since
you never leave your editor window and you never have to type the title (or other IDs) of
your test block nor the path of the containing test file. You can quickly jump between test
blocks even across different files. Test-driven development usually demands very frequent
interactions and iterations, and Test::Nginx is particularly optimized to speed up this
process.
Sometimes you may forget to remove the --- ONLY line from some test files even after
debugging, this will incorrectly skip all the other tests in those files. To catch such mistakes,
Test::Nginx always reports a warning for files using the ONLY special section, as in
$ prove t/foo.t
t/foo.t .. # I found ONLY: maybe you're debugging?
t/foo.t .. ok
All tests successful.
Files=1, Tests=2, 0 wallclock secs (0.01 usr 0.00 sys + 0.09 cusr 0.03 csys = 0.13 CPU
)
Result: PASS
This way it is much easier to identify any leftover --- ONLY lines.
Similar to ONLY , Test::Nginx also provides the LAST data section to make the containing
test block become the last test block being run in that test file.
20
Running Tests
The special data sections ONLY and LAST are actually features inherited from
Note
the Test::Base module.
Skipping Tests
We can specify the special SKIP data section to skip running the containing test block
unconditionally. This is handy when we write a test case that is for a future feature or a test
case for a known bug that we haven’t had the time to fix right now. For example,
It is also possible to skip a whole test file in the prologue part. Just replace the use
statement with the following form.
It is also possible to conditionally skip a whole test file but it requires a little bit of
Note Perl programming. Interested readers can try using a BEGIN {} before the
use statement to calculate the value of the skip_all option on the fly.
21
Running Tests
The test suite of the ngx_http_lua module follows this practice, for example, which has test
file names like below
t/000-sanity.t
t/001-set.t
t/002-content.t
t/003-errors.t
...
t/139-ssl-cert-by.t
Although the prove utility supports running test files in multiple parallel jobs via the -jN
option, Test::Nginx does not really support this mode since all the test cases share exactly
the same test server directory, t/servroot/ , and the same listening ports, as we have
already seen, while parallel running requires strictly isolated running environments for each
individual thread of execution. One can still manually split the test files into different groups
and run each group on a different (virtual) machine or an isolated environment like a Linux
container.
We can always disable this test block shuffling behavior by calling the Perl function,
no_shuffle() , imported by the Test::Nginx::Socket module, before the run_tests() call
no_shuffle();
run_tests();
__DATA__
...
With the no_shuffle() call in place, the test blocks are run in the exact same order as their
appearance in the test file.
22
Running Tests
23
Preparing Tests
Preparing Tests
As we have seen in the previous sections, Test::Nginx provides a simple declarative
format to express test cases. Each test case is represented by a test block. A test block
consists of a title, an optional description, and several data sections for specifying inputs and
expected outputs. In this section, we will have a close look at how to prepare such test
cases for different test requirements.
Designing test cases is an art, in many ways. It may, sometimes, take even more time and
effort than implementing the feature to be tested, according to our own experience.
Test::Nginx tries hard to make writing tests as simple as possible but it still cannot
automate the whole test case design process. Only you know exactly what to test and how it
can be tested anyway. This section will focus on the basic primitives provided by
Test::Nginx that you can take advantage of to devise clever and effective test cases.
The most common one is the config section which is used to insert custom snippets inside
the server {} configuration block for the default test server. We can also use the
http_config section to insert our custom content into the http {} configuration block of
nginx.conf . The main_config section can be used to insert content into the top-level scope
24
Preparing Tests
=== TEST 1:
--- main_config
env MY_ENVIRONMENT;
--- http_config
init_worker_by_lua_block {
print("init")
}
--- config
location = /t {
echo ok;
}
--- request
GET /t
--- response_body
ok
This test block will generate an nginx.conf file with the following basic structure:
...
env MY_ENVIRONMENT;
http {
...
init_worker_by_lua_block {
print("init")
}
server {
...
location = /t {
echo ok;
}
}
}
Please pay attention to how the main_config , http_config , and config data sections'
values are mapped into different locations in the NGINX configuration file.
When in doubt, we can always check out the actual nginx.conf file generated by the test
scaffold at the location t/servroot/conf/nginx.conf in the current working directory (usually
just being the root directory of the current project).
25
Preparing Tests
Test::Nginx generates a new nginx.conf file for each test block, which makes it possible
for each test block to become self-contained. By default, the test scaffold automatically starts
a new NGINX server before running each test block and shuts down the server immediately
after running the block. Fortunately, NGINX is a lightweight server and it is usually very fast
to start and stop. Thus, the test blocks are not that slow to run as it might look.
Preparing Requests
The simplest way to prepare a request is to use the request data section, as in
--- request
GET /t?a=1&b=2
The HTTP/1.1 protocol is used by default. You can explicitly make it use the HTTP/1.0
protocol if desired:
--- request
GET /t?a=1&b=2 HTTP/1.0
Leading spaces or empty lines in the value of the request section are automatically
discarded. You can even add comments by leading them with a # character, as in
--- request
You can add some additional request headers at the same time through the more_headers
section as below.
--- request
GET /t
--- more_headers
Foo: bar
Bar: baz
Pipelined Requests
Preparing pipelined HTTP requests are also possible. But you need to use the
pipelined_requests section instead of request . For instance,
26
Preparing Tests
It is worth noting that we use the eval filter with the pipelined_requests section to treat
the literal value of that section as Perl code. This way we can construct a Perl array of the
request strings, which is the expected data format for the pipelined_requests section.
Similarly we need a similar trick for the response_body section when checking outputs. With
an array of expected response body data, we can expect and check different values for
different individual request in the pipeline. Note, however, not every data section supports
the same array-typed value semantics as response_body .
Checking Responses
We have already visited the response_body and error_code data sections for checking the
response body data and response status code, respectively.
27
Preparing Tests
t/foo.t .. 1/?
# Failed test 'TEST 1: long string test - response_body -
response is expected (req 0)'
# at .../test-nginx/lib/Test/Nginx/Socket.pm line 1282.
# got: ..."IT 2.x is enabled.\x{0a}\x{0a}"...
# length: 409
# expected: ..."IT 2.x is not enabled.\x{0a}"...
# length: 412
# strings begin to differ at char 400 (line 1 column 400)
# Looks like you failed 1 test of 2.
/tmp/foo.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/2 subtests
1. it is the test block with the title TEST 1: long string test that is failing,
3. the actual response body data is 409 bytes long while the expected value is 412 bytes,
and
4. the expected value has an additional not word in the string fragment IT 2.x is
enabled and the difference starts at the offset 400 in the long string.
Behind the scene, Test::Nginx uses the Perl module Test::LongString to do the long string
comparisons. It is also particularly useful while checking response body data in binary
formats.
If your response body data is in a multi-line textual format, then you may also want to use a
diff -style output when the data does not match. To achieve this, we can call the
no_long_string() Perl function before the run_tests() function call in the prologue part of
28
Preparing Tests
no_long_string();
run_tests();
__DATA__
=== TEST 1:
--- config
location = /t {
echo "Life is short.";
echo "Moon is bright.";
echo "Sun is shining.";
}
--- request
GET /t
--- response_body
Life is short.
Moon is deem.
Sun is shining.
Note the no_long_string() call in the prologue part. It is important to place it before the
run_tests() call otherwise it would be too late for it to take effect, obviously.
Invoking the prove utility (or any shell wrappers for it) to run this test file gives the following
details about the test failure:
It is obvious that the second line of the response body output is different.
You can even further disable the diff -style comparison mode by adding a no_diff() Perl
function call in the prologue part. Then the failure report will look like this:
29
Preparing Tests
That is, Test::Nginx just gives full listing of the actual response body data and the expected
one without any abbreviations or hand-holding.
Be careful when you are using the multi-line data section value form. A trailing newline
character appended to your section value may make your pattern never match. In this case
the chomp filter we introduced in an early section can be very helpful here. For example,
You can also use the eval filter to construct a Perl regular expression object with a Perl
expression, as in
30
Preparing Tests
--- response_headers
Foo: bar
Bar: baz
!Blah
1. The response header Foo must appear and must take the value bar ;
2. The response header Bar must appear and must take the value baz ; and
3. The response header Blah must not appear or take an empty value.
True-False Tests
One immediate testing requirement is to check whether or not a piece of text appears in any
error log messages. Such checks can be done via the data sections error_log and
no_error_log , respectively. The former ensures that some lines in the error log file contain
the string specified as the section value while the latter tests the opposite: ensuring that no
line contains the pattern.
For example,
--- error_log
Hello world from my server
Then the string Hello world from my server (without the trailing new-line) must appear in at
least one line of the NGINX error log. You can specify multiple strings in separate lines of the
section value to perform different checks, for instance,
31
Preparing Tests
--- error_log
This is a dog!
Is it a cat?
Then it performs two error log checks, one is to ensure that the string This is a dog!
appears in some error log lines. The order of these two string patterns do not matter at all.
If one of the string pattern failed to match any lines in the error log file, then we would get a
test failure report from prove like below.
If you want to specify a Perl regular expression (regex) as one of the patterns, then you
should use the eval section filter to construct a Perl-array as the section value, as in
As we have seen earlier, Perl regexes can be constructed via the qr/…/ quoting syntax.
Perl string patterns in the Perl array specified by double quotes or single quotes are still
treated as plain string patterns, as usual. If the array contains only one regex pattern, then
you can omit the array itself, as in
Test::Nginx puts the error log file of the test NGINX server in the file path
t/servroot/logs/error.log . As a test writer, we frequently check out this file directly when
things go wrong. For example, it is common to make mistakes or typos in the patterns we
specify for the error_log section. Also, scanning the raw log file can give us insight about
the details of the NGINX internal working when the NGINX debugging logs are enabled in
the NGINX build.
The no_error_log section is very similar to error_log but it checks the nonexistence of the
string patterns in the NGINX error log file. One of the most frequent uses of the
no_error_log section is to ensure that there is no error level messages in the log file.
32
Preparing Tests
--- no_error_log
[error]
If, however, there is a line in the nginx error log file that contains the string [error] , then
the test fails. Below is such an example.
This is a great way to find the details of the error quickly by just looking at the test report.
Like error_log , this section also supports Perl array values and Perl regex values through
the eval filter.
Grep Tests
The error_log and no_error_log sections are very handy in quickly checking the
appearance of contain patterns in the NGINX error log file. But they have serious limitations
in that it is impossible to impose stronger constraints on the relative order of the messages
containing the patterns nor on the number of their occurrences.
grep_error_log section to specify a pattern, with which the test framework scans through
the NGINX error log file and collect all the matched parts of the log file lines along the way,
forming a final result. This aggregated log data result is then matched against the expected
value specified as the value of the grep_error_log_out section, in a similar way as with the
response_body section discussed above.
33
Preparing Tests
Here we use the Lua function print() provided by the ngx_http_lua module to generate
NGINX error log messages at the notice level. This test case tests the number of the log
messages containing the string it is matched! . It is important to note that only the
matched part of the log file lines are collected in the final result instead of the whole log
lines. This simplifies the comparison a lot since NGINX error log messages can contain
varying details like timestamps and connection numbers.
A more useful form of this test is to specify a Perl regex pattern in the grep_error_log
section. Consider the following example.
We specify a Perl regex pattern, test: .*?\.\.\. , here to filter out all the error log
messages starting with test: and ending with … . And naturally in this test we also require
the relative order of these two messages, that is, before sleeping must appear before
after sleeping . Otherwise, we shall see failure reports like below:
34
Preparing Tests
As with the response_body section, we can also call the no_long_string() Perl function
before run_tests() in the test file prologue, so as to disable the long string output mode
and enable the diff mode. Then the test failure would look like this:
Obviously, for this test case, the diff format looks better.
35
Preparing Tests
Here we create a timer via the ngx.timer.at Lua function, which expires after 0.1 seconds.
Due to the asynchronous nature of timers, the request handler does not wait for the timer to
expire and immediately finishes processing the current request and sends out a response
with an empty body. To check for the log message HERE! generated by the timer handler
f , we have to specify an extra delay for the test scaffold to wait. The 0.12 seconds time is
specified in this example but any values larger than 0.1 would suffice. Without the wait
section, this test case would fail with the following output:
Obviously the test scaffold checks the error log too soon, even before the timer handler runs.
Section Review
Test::Nginx::Socket offers a rich set of data sections for specifying various different input
data and expected output data, ranging from NGINX configuration file snippets, test
requests, to expected responses and error log messages. We have already demonstrated
the power of data driven testing and declarative test case crafting. We want to achieve
multiple goals at the same time, that is, not only to make the tests self-contained and highly
readable, but also to make the test report easy to interpret and analyze when some of the
tests fail. Raw files automatically generated by the test scaffold, like
t/servroot/conf/nginx.conf and t/servroot/logs/error.log , should be checked frequently
when manually debugging the test cases. The next section extends the discussion of this
section with a focus on testing erroneous cases.
36
Preparing Tests
37
Testing Erroneous Cases
The following example tests the case of throwing a Lua exception in the context of
init_by_lua_block of the ngx_http_lua module.
The Lua code in init_by_lua_block runs in the NGINX master process during the NGINX
configuration file loading process. Throwing out a Lua exception there aborts the NGINX
startup process immediately. The occurrence of the must_die section tells the test scaffold
to treat NGINX server startup failures as a test pass while a successful startup as a test
failure. The error_log section there ensures that the server fails in the expected way, that
is, due to the "I am dying!" exception.
If we remove the --- must_die line from the test block above, then the test file won’t even
run to completion:
38
Testing Erroneous Cases
By default the test scaffold treats NGINX server startup failures as fatal errors in running the
tests. The must_die section, however, turns such a failure into a normal test checkup.
Consider the following example that closes the downstream connection immediately after
sending out the first part of the response body.
39
Testing Erroneous Cases
Obviously, the test scaffold complains about the lack of the "last chunk" used to indicate the
end of the chunked encoded data stream. Because the server aborts the connection in the
middle of response body data sending, there is no chance for the server to properly send
well-formed response bodies in the chunked encoding.
40
Testing Erroneous Cases
Testing and emulating timeout errors are often tricky in a self-contained unit test framework
since most of the network traffic initiated by the test cases are local only, that is, going
through the local "loopback" device that has perfect latency and throughput. We will examine
some of the tricks that can be used to reliably emulate various different kinds of timeout
errors in the test suite.
Connecting Timeouts
Connecting timeouts in the context of the TCP protocol are easiest to emulate. Just point the
connecting target to a remote address that always drops any incoming ( SYN ) packets via a
firewall rule or something similar. We provide such a "black-hole service" at the port 12345
of the agentzh.org host. You can make use of it if your test running environment allows
public network access. Consider the following test case.
location = /t {
content_by_lua_block {
local sock = ngx.socket.tcp()
sock:settimeout(100) -- ms
local ok, err = sock:connect("agentzh.org", 12345)
if not ok then
ngx.log(ngx.ERR, "failed to connect: ", err)
return ngx.exit(500)
end
ngx.say("ok")
}
}
--- request
GET /t
--- response_body_like: 500 Internal Server Error
--- error_code: 500
--- error_log
failed to connect: timeout
We have to configure the resolver directive here because we need to resolve the domain
name agentzh.org at request time (in Lua). We check the NGINX error log via the
error_log section for the error string returned by the cosocket object’s connect() method.
It is important to use a relatively small timeout threshold in the test cases so that we do not
have to wait for too long to complete the test run. Tests are meant to be run very often. The
more frequently we run the tests, the more value we may gain from automating the tests.
41
Testing Erroneous Cases
It is worth mentioning that the test scaffold’s HTTP client does have a timeout threshold as
well, which is 3 seconds by default. If your test request takes more than 3 seconds, you get
an error message in the test report:
This message is what we would get if we commented out the settimeout call and relies on
the default 60 second timeout threshold in cosockets.
We could change this default timeout threshold used by the test scaffold client by setting a
value to the timeout data section, as in
--- timeout: 10
Reading Timeouts
Emulating reading timeouts is also easy. Just try reading from a wire where the other end
never writes anything but still keeps the connection alive. Consider the following example:
42
Testing Erroneous Cases
Here we use the main_config data section to define a TCP server of our own, listening at
the port of 5678 on the local host. This is a mocked-up server that can establish new TCP
connections but never write out anything and just sleep for 10 second before closing the
session. Note that we are using the ngx_stream_lua module in the stream {} configuration
block. In our location = /t , which is the main target of this test case, connects to our mock
server and tries to read a line from the wire. Apparently the 100ms timeout threshold on the
client side is reached first and we can successfully exercise the error handling code for the
reading timeout error.
Sending Timeouts
Triggering sending timeouts is much harder than connecting and reading timeouts. This is
due to the asynchronous nature of writing.
43
Testing Erroneous Cases
For performance reasons, there exists at least two layers of buffers for writes:
2. the socket send buffers in the operating system kernel’s TCP/IP stack implementation
To make the situation even worse, there also at least exists a system-level receive buffer
layer on the other end of the connection.
To make a send timeout error happen, the most naive way is to fill out all these buffers along
the data sending chain while ensuring that the other end never actually reads anything on
the application level. Thus, buffering makes a sending timeout particularly hard to reproduce
and emulate in a typical testing and development environment with a small amount of (test)
payload.
Fortunately there is a userland trick that can intercept the libc wrappers for the actual system
calls for socket I/O and do funny things that could otherwise be very difficult to achieve. Our
mockeagain library implements such a trick and supports emulating timeout errors at user-
specified precise positions in the output data stream.
The following example triggers a sending timeout right after sending out the "hello, world"
string as the response body.
44
Testing Erroneous Cases
location = /t {
content_by_lua_block {
ngx.say("hi bob!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #1 failed: ", err)
return
end
ngx.say("hello, world!")
local ok, err = ngx.flush(true)
if not ok then
ngx.log(ngx.ERR, "flush #2 failed: ", err)
return
end
}
}
--- request
GET /t
--- ignore_response
--- error_log
flush #2 failed: timeout
--- no_error_log
flush #1 failed
Note the send_timeout directive that is used to configure the sending timeout for NGINX
downstream writing operations. Here we use a small threshold, 100ms , to ensure our test
case runs fast and never hits the default 3 seconds timeout threshold of the test scaffold
client. The postpone_output 1 directive effectively turns off the "postpone output buffer" of
NGINX, which may hold our output data before even reaching the libc system call wrappers.
Finally, the ngx.flush() call in Lua ensures that no buffers along the NGINX output filter
chain holds our data without sending downward.
Before running this test case, we have to set the following system environment variables (in
the bash syntax):
export LD_PRELOAD="mockeagain.so"
export MOCKEAGAIN="w"
export MOCKEAGAIN_WRITE_TIMEOUT_PATTERN='hello, world'
export TEST_NGINX_EVENT_TYPE='poll'
45
Testing Erroneous Cases
4. The TEST_NGINX_EVENT_TYPE='poll' setting makes NGINX server uses the poll event
API instead of the system default (being epoll on Linux, for example). This is because
mockeagain only supports poll events for now. Behind the scene, this environment
just makes the test scaffold generate the following nginx.conf snippet.
events {
use poll;
}
You need to ensure, however, that your NGINX or OpenResty build has the poll
support compiled in. Basically, the build should have the ./configure option --with-
poll_module .
Ideally, we could set these environments directly inside the test file because this test case
will never pass without these environments anyway. We could add the following Perl code
snippet to the very beginning of the test file prologue (yes, even before the use statement):
BEGIN {
$ENV{LD_PRELOAD} = "mockeagain.so";
$ENV{MOCKEAGAIN} = "w";
$ENV{MOCKEAGAIN_WRITE_TIMEOUT_PATTERN} = 'hello, world';
$ENV{TEST_NGINX_EVENT_TYPE} = 'poll';
}
The BEGIN {} block is required here because it runs before Perl loads any modules,
especially Test::Nginx::Socket , in which we want these environments to take effect.
46
Testing Erroneous Cases
It is a bad idea, however, to hard-code the path of the mockeagain.so file in the test file itself
since different test runners might put mockeagain in different places in the file system. Better
let the test runner configure the LD_LIBRARY_PATH environment containing the actual library
path from outside.
Mockeagain Troubleshooting
If you are seeing the following error while running the test case above,
then you should check whether you have added the directory path of your mockeagain.so
library to the LD_LIBRARY_PATH environment. On my system, for example, I have
export LD_LIBRARY_PATH=$HOME/git/mockeagain:$LD_LIBRARY_PATH
then your NGINX or OpenResty build does not have the poll module compiled in. And you
should rebuild your NGINX or OpenResty by passing the --with-poll_module option to the
./configure command line.
We will revisit the mockeagain library in the Test Modes section soon.
For example, while testing a Memcached client, it would be pretty hard to emulate erroneous
error responses or ill-formed responses with a real Memcached server. Now it is trivial with
mocking:
47
Testing Erroneous Cases
assert(memc:connect("127.0.0.1", 1921))
Our mocked-up Memcached server can behave in any way that we like. Hooray!
48
Testing Erroneous Cases
The Test::Nginx::Socket test framework provides special data sections to help emulating
ill-behaved HTTP clients.
So we easily construct a malformed request that does not have a Host header, which
results in a 400 response from the NGINX server, as expected.
The request data section we have been using so far, on the other hand, always ensures
that a well-formed HTTP request is sent to the test server.
We have already discussed the timeout data section that can be used to adjust the default
timeout protection threshold used by the test scaffold client. We could also use it to abort the
connection prematurely. A small timeout threshold is often desired for this purpose. To
suppress the test scaffold from printing out an error on client timeout, we can specify the
abort data section to signal the test scaffold. Let’s put these together in a simple test case.
49
Testing Erroneous Cases
content_by_lua_block {
local ok, err = ngx.on_abort(function ()
ngx.log(ngx.NOTICE, "on abort handler called!")
ngx.exit(444)
end)
if not ok then
error("cannot set on_abort: " .. err)
end
ngx.sleep(0.7) -- sec
ngx.log(ngx.NOTICE, "main handler done")
}
}
--- request
GET /t
--- timeout: 0.2
--- abort
--- ignore_response
--- no_error_log
[error]
main handler done
--- error_log
client prematurely closed connection
on abort handler called!
In this example, we make the test scaffold client abort the connection after 0.2 seconds via
the timeout section. Also we prevent the test scaffold from printing out the client timeout
error by specifying the abort section. Finally, in the Lua application code, we checks for
client abort events by turning on the lua_check_client_abort directive and aborts the server
processing by calling ngx.exit(444) in our Lua callback function registered by the
ngx.on_abort API.
(exceeding the timeout threshold as specified by the --- timeout section). This can ensure
the NGINX server always actually closes the connection when the request specifies the
"Connection: close" request header.
50
Testing Erroneous Cases
When the server does not close the connection, there is a "connection leak" bug on the
server side. For example, NGINX uses reference counting (in r→main→count ) in its HTTP
subsystem to determine whether a request can be closed and freed. When there is an error
in this reference counting, NGINX may never close the request, leading to resource leaks. In
such cases, the corresponding test cases always fail with a client-side timeout error, for
instance,
51
Test Modes
Test Modes
One unique feature of Test::Nginx is that it allows running the same test suite in wildly
different ways, or test modes, by just configuring some system environment variables.
Different test modes have different focuses and may find different categories of bugs or
performance issues in the applications being tested. The data driven nature of the test
framework makes it easy to add new test modes without changing the user test files at all.
And it is also possible to combine different test modes to form new (hybrid) test modes. The
capability of running the same test suite in many different ways helps squeezing more value
out of the tests we already have.
This section will iterate through various different test modes supported by
Test::Nginx::Socket and their corresponding system environment variables used to enable
or control them.
Benchmark Mode
Test::Nginx has built-in support for performance testing or benchmarking. It can invoke
external load testing tools like ab and weighttp to load each test case as hard as
possible.
To enable this benchmark testing mode, you can specify the TEST_NGINX_BENCHMARK system
environment variable before running the prove command. For example,
This will run all the test cases in t/foo.t in benchmark mode. In particular, the first number,
2000 in the environment variable value indicates the total number of requests used to flood
the server while the second number, 2 , means that the number of concurrent connections
the client will use.
If the test case uses an HTTP 1.1 request (which is the default), then the test scaffold will
invoke the weighttp tool. If it is an HTTP 1.0 request, then the test scaffold invokes the ab
tool.
This test mode requires the unbuffer command-line utility from the expect package, as
well as the ab and weighttp load testing tools. On Ubuntu/Debian systems, we can install
most of the dependencies with the command
52
Test Modes
You may need to build and install weighttp from source on Ubuntu/Debian yourself due to
the lack of the Debian package.
For the Mac OS X system, on the other hand, we can use homebrew to install it like this:
t/hello.t
run_tests();
__DATA__
Then we run this test file in the benchmark mode, like this:
53
Test Modes
starting benchmark...
spawning thread #1: 2 concurrent requests, 200000 total requests
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done
We can see that this test case can achieve 75393 requests per second and 12218 KB per
second. Not bad for a single NGINX worker process!
54
Test Modes
It is also important to keep an eye on failed requests. We surely do not care about the
performance of error pages. We can get the number of error responses by checking the
following output lines:
We are glad to see that all our requests succeeded in this run.
master_on();
workers(4);
This way we can have 4 NGINX worker processes sharing the load.
Behind the scenes, the test scaffold assembles the command line involving weighttp from
the test block specification, in this case, the command line looks like this:
There exists complicated cases, however, where the test scaffold fails to derive the exact
command line equivalent.
We can also enforce HTTP 1.0 requests in our test block by appending the "HTTP/1.0" string
to the value of the --- request section:
--- request
GET /hello HTTP/1.0
In this case, the test scaffold will invoke the ab tool to flood the matching HTTP 1.0
request. The output might look like this:
55
Test Modes
Concurrency Level: 2
Time taken for tests: 3.001 seconds
Complete requests: 200000
Failed requests: 0
Keep-Alive requests: 198000
Total transferred: 33190000 bytes
HTML transferred: 2400000 bytes
Requests per second: 66633.75 [#/sec] (mean)
Time per request: 0.030 [ms] (mean)
Time per request: 0.015 [ms] (mean, across all concurrent
requests)
Transfer rate: 10798.70 [Kbytes/sec] received
56
Test Modes
Processing: 0 0 132
Waiting: 0 0 132
Total: 0 0 132
t/hello.t .. ok
All tests successful.
Files=1, Tests=2, 4 wallclock secs ( 0.02 usr 0.00 sys + 0.51
cusr 1.39 csys = 1.92 CPU)
Result: PASS
Failed requests: 0
Requests per second: 66633.75 [#/sec] (mean)
Transfer rate: 10798.70 [Kbytes/sec] received
Different hardware and operating systems may lead to very different results. Therefore, it
generally does not make sense at all to directly compare numbers obtained from different
machines and systems.
Clever users can write some external scripts to record and compare these numbers across
different runs, so as to keep track of performance changes in the web server or application.
Such comparison scripts must take into account any measurement errors and any
disturbances from other processes running in the same system.
57
Test Modes
One example of OpenResty features that behaves different upon HUP reload than server
restart is the shared dictionary mechanism (lua_shared_dict) that does not wipe out any
existing data in the shared memory storage during HUP reload. When testing this feature or
application code relying on this feature, it is wise to test how it behaves upon HUP reload.
We saw in the past that some 3rd-party NGINX C modules dealing with shared memory, for
example, have bugs across HUP reloads, like nasty memory leaks.
Test::Nginx has built-in support for the HUP reload test mode, which can be enabled by
export TEST_NGINX_USE_HUP=1
Then we can run our existing test suite as usual but now HUP signal is used by the test
scaffold to reload the NGINX configuration specified by different test blocks. The NGINX
server will only be automatically shut down when the test harness finishes running each test
file.
We can even avoid the automatic server shutdown behavior upon test file
Note completion by specifying the TEST_NGINX_NO_CLEAN=1 environment. See the later
section Manual Debugging Mode for more details.
UNIX signals like HUP usually work asynchronously. Thus, there is a delay between the test
scaffold finishes sending the HUP signal to the NGINX server and the NGINX server forks
off a new worker process using the newly loaded configuration and starts accepting new
connections with the new worker. For this reason, there is a (small) chance that the request
of a test block is served by an NGINX worker process still using the configuration specified
by the previous test block. Although Test::Nginx tries hard to wait as long as it can with
some simple heuristics, some test blocks may still experience some intermittent test failures
due to the mismatch of the NGINX configuration. Be prepared for such false positives when
using the HUP reload testing mode. This is also one of the reasons why the HUP reload
mode is not the default. We hope this issue can be further improved in the future.
Another limitation with the HUP reload mode is that HUP reloads only happen upon test
block boundaries. There are cases where it is desired to issue HUP reload in the middle of a
test block. We can achieve that by using some custom Lua code in your test block to send a
58
Test Modes
Valgrind Mode
One of the biggest enemies in web servers or web applications that are supposed to run in a
24x7 manner is memory issues. Memory issues include memory leaks, memory invalid
reads (like reading beyond the buffer boundary), and memory invalid writes (like buffer
overflow). In case of memory leaks, the processes can take up more and more memory in
the system and eventually exhaust all the physical memory available, leading to
unresponsive systems or triggering the system to start killing processes with force. Memory
invalid accesses, on the other hand, can lead to process crashes (like segmentation faults),
or worse, leading to nondeterminism in the process' s behavior (like giving out wrong
results).
Valgrind is a powerful tool for programmers to detect a wide range of memory issues,
including many memory leaks and many memory invalid accesses. This is usually for
debugging lower level code like the OpenResty core (including the NGINX core), the Lua or
LuaJIT VM, as well as those Lua libraries involved with C and/or FFI. Plain Lua code without
using FFI is considered "safe" and is not subject to most of the memory issues.
Plain Lua code without using FFI can still contain bugs that result in memory
leaks, like inserting new keys into a globally shared Lua table without control or
Note appending a string to a global Lua string infinitely. Such memory leaks,
however, cannot be detected by Valgrind since it is managed by Lua or LuaJIT’s
garbage collector.
Test::Nginx provides a testing mode that can automatically use Valgrind to run the existing
tests and check if there is any memory issues that can be caught by Valgrind. This test
mode is called "Valgrind mode". To enable this mode, just set the environment
TEST_NGINX_USE_VALGRIND , as in
export TEST_NGINX_USE_VALGRIND=1
59
Test Modes
Here we use the ffi.new API to allocate a C string buffer of 3 bytes long and initialize the
buffer with the bytes 48, 49, and 0, in the decimal ASCII code. Then we call the standard C
function strlen via the ffi.C API with our C string buffer.
It is worth noting that we need to first declare the strlen function prototype via the
ffi.cdef API. Since we declare the C function in the request handler
This example contains no memory issues since we properly initialize our C string buffer by
setting the null terminator character ( \0 ) at end of our C string. The C function strlen
should correctly report back the length of the string, which is 2 , without reading beyond our
buffer boundary. Now we run this test file with the Valgrind mode enabled using the default
OpenResty installation’s nginx :
export TEST_NGINX_USE_VALGRIND=1
export PATH=/usr/local/openresty/nginx/sbin:$PATH
prove t/a.t
60
Test Modes
There should be a lot of output. The first few lines should look like this:
Ouch! Valgrind reports a memory invalid read error. Fortunately it is just a false positive due
to the optimization inside the LuaJIT VM when it is trying to create a new Lua string. The
LuaJIT code repository maintains a file named lj.supp that lists all the known Valgrind false
positives that can be used to suppress these messages. We can simply copy that file over
and rename it to valgrind.suppress in the current working directory. Then Test::Nginx will
automatically feed this valgrind.suppress file into Valgrind while running the tests in
Valgrind mode. Let’s try that:
cp -i /path/to/luajit-2.0/src/lj.supp ./valgrind.suppress
prove t/a.t
We might encounter other Valgrind false positives like some of those in the NGINX core or
the OpenSSL library. We can add those to the valgrind.suppress file as needed. The
Test::Nginx test scaffold always outputs suppression rules that can be added directly to the
suppression file. For the example above, the last few lines of the output are like below.
61
Test Modes
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:str_fastcmp
fun:lj_str_new
fun:lua_setfield
fun:ngx_http_lua_cache_store_code
fun:ngx_http_lua_cache_loadbuffer
fun:ngx_http_lua_content_handler_inline
fun:ngx_http_core_content_phase
fun:ngx_http_core_run_phases
fun:ngx_http_process_request
fun:ngx_http_process_request_line
fun:ngx_epoll_process_events
fun:ngx_process_events_and_timers
fun:ngx_single_process_cycle
fun:main
}
t/a.t .. ok
All tests successful.
Files=1, Tests=3, 2 wallclock secs ( 0.01 usr 0.00 sys + 1.47
cusr 0.07 csys = 1.55 CPU)
Result: PASS
The suppression rule generated is the stuff between the curly braces (including the curly
braces themselves):
62
Test Modes
{
<insert_a_suppression_name_here>
Memcheck:Addr4
fun:str_fastcmp
fun:lj_str_new
fun:lua_setfield
fun:ngx_http_lua_cache_store_code
fun:ngx_http_lua_cache_loadbuffer
fun:ngx_http_lua_content_handler_inline
fun:ngx_http_core_content_phase
fun:ngx_http_core_run_phases
fun:ngx_http_process_request
fun:ngx_http_process_request_line
fun:ngx_epoll_process_events
fun:ngx_process_events_and_timers
fun:ngx_single_process_cycle
fun:main
}
We could have simply copied and pasted this rule into the valgrind.suppress file. It is worth
mentioning however, we can make this rule more general to exclude the C function frames
belonging to the NGINX core and the ngx_lua module (near the bottom of the rule) since this
false positive is related to LuaJIT only.
Let’s continue our experiment with our current example. Now we edit our test case and
change the following line
to
That is, we replace the null character (with ASCII code 0) to a non-null character whose
ASCII code is 50. This change makes our C string buffer lacks any null terminators and thus
calling strlen on it will result in memory reads beyond our buffer boundary.
Unfortunately running this edited test file fail to yield any Valgrind error reports regarding this
memory issue:
63
Test Modes
The response body check fails as expected. This time strlen returns 4, which is larger
than our buffer size, 3. This is a clear indication of memory buffer over-read. So why does
Valgrind fail to catch this?
To answer this question, we need some knowledge about how LuaJIT allocates memory. By
default, LuaJIT uses its own memory allocator atop the system allocator (usually provided by
the standard C library). For performance reasons, LuaJIT pre-allocates large memory blocks
than request. Because Valgrind has no knowledge about LuaJIT’s own allocator and Lua
user-level buffer boundary definitions, it can be cheated and can get confused.
To remove this limitation, we can enforce LuaJIT to use the system allocator instead of its
own. To achieve this, we need build LuaJIT with special compilation options like below.
The most important option is -DLUAJIT_USE_SYSMALLOC which forces LuaJIT to use the system
allocator. The other options are important for our debugging as well, for example, the
CCDEBUG=-g option is to enable debug symbols in the LuaJIT binary while -
DLUAJIT_USE_VALGRIND enables some other special collaborations with Valgrind inside the
LuaJIT VM.
If we are using the OpenResty bundle, we can simply build another special version of
OpenResty like below:
64
Test Modes
./configure \
--prefix=/opt/openresty-valgrind \
--with-luajit-xcflags='-DLUAJIT_USE_VALGRIND -
DLUAJIT_USE_SYSMALLOC' \
--with-debug \
-j4
make -j4
sudo make install
This will build and install a special debug version of OpenResty for Valgrind checks to the file
system location /opt/openresty-valgrind .
There is some other LuaJIT special build options that can further help us, like -
Note DLUA_USE_APICHECK and -DLUA_USE_ASSERT . But they are beyond the scope of our
current example.
Now let’s try running our previous buggy example with this special OpenResty and Valgrind:
export TEST_NGINX_USE_VALGRIND=1
export PATH=/opt/openresty-valgrind/nginx/sbin:$PATH
prove t/a.t
65
Test Modes
We omit the rest of the output for brevity. Here Valgrind reports an invalid read of one byte of
memory in the C function strlen , which is exactly what we’d expect. Mission
accomplished!
LuaJIT built with the system allocator should only be used with Valgrind only. On
Note
computer architectures like x86_64, such LuaJIT may not even start up.
From this example, we can see how application-level memory allocation optimizations and
management can compromise the effectiveness of Valgrind’s memory issue detection.
Similarly, the NGINX core also comes with its own memory allocator via "memory pools".
Such memory pools tend to allocate page-sized memory blocks for small allocations and
thus can also inversely affect Valgrind' s detection. OpenResty provides a patch for the
NGINX core to disable the memory pool optimizations altogether. The easiest way to use the
patch is to specify the --with-no-pool-patch option when running the ./configure script
while building OpenResty.
This Valgrind mode is used by OpenResty developers on a daily basis and has helped
locate countless memory manage bugs in the OpenResty C and Lua/FFI code base.
Interestingly, this test mode also located memory issues in the official NGINX core and the
official LuaJIT core. Unlike analyzing core dumps, Valgrind can almost always find the first
scene of memory offends, studying the memory error reports can usually give rise to
immediate code fixes.
As with all the other tools, Valgrind has its own limitations and cannot find all the memory
issues even when we carefully disable application level memory allocators as demonstrated
above. For example,
1. memory issues on the C runtime stack cannot be caught by Valgrind (at least for
Valgrind' s default memcheck tool).
66
Test Modes
=== TEST 1:
--- config
location = /t {
content_by_lua_block {
package.path = "/path/to/some/lib/?.lua;" .. package.path
ngx.say("ok")
}
}
--- request
GET /t
--- response_body
ok
--- no_error_log
[error]
This example demonstrates a common mistake made by many OpenResty beginners. The
package.path field specifies the search paths used by the require builtin function for
loading pure Lua modules. This string value is hooked up in the global Lua table package
which has the same lifetime as the current Lua virtual machine (VM) instance. Since Lua VM
instances usually have the same lifetime as NGINX worker processes (unless the
lua_code_cache directive is turned off in nginx.conf ), prepending a new string to the value
Unfortunately Valgrind cannot find this leak at all since the leak happens in the GC-managed
memory inside the Lua VM because all such leaked memory will always get released upon
GC destruction (or VM destruction) before the current process exits, which fools Valgrind to
67
Test Modes
think that there is no leaks at all. Interested readers can try running this example with the
"Valgrind test mode" as explained in the previous section.
1. loads the NGINX server with many of the test request specified in the test block, in a
way similar to the "benchmark test mode" we discussed earlier,
2. and at the same time, periodically polls and records the memory footprint of the NGINX
worker process with the system command ps ,
3. and finally analyzes the memory usage data points collected in 2) by finds the slope
( k ) of a line that best fits those data points.
To make use of this mode, just specify the TEST_NGINX_CHECK_LEAK=1 environment, before
running existing test files, as in
export TEST_NGINX_CHECK_LEAK=1
prove t/a.t
Assuming the t/a.t test file contains the test block example given above, we should get an
output similar to the following.
68
Test Modes
t/a.t .. TEST 1:
LeakTest: [3740 3756 3620 3624 4180 3808 4044 4240 4272 4888
3876 3520 4516
4368 5216 4796 4420 4508 4068 5336 5220 3888 4196 4544 4100
3696 5028 5080
4580 3936 5236 4308 5320 4748 5464 4032 5492 4996 4588 4932
4632 6388 5228
5516 4680 5348 5420 5964 5436 5128 5720 6324 5700 4948 4312
6208 5192 5268
5600 4144 6556 4248 5648 6612 4044 5408 5120 5120 5740 6048
6412 5636 6488
5184 6036 5436 5808 4904 4980 6772 5148 7160 6576 6724 5024
6768 7264 5540
5700 5284 5244 4512 5752 6752 6868 6064 4940 5636 6388 7468]
LeakTest: k=22.6
t/e.t .. ok
All tests successful.
Files=1, Tests=3, 6 wallclock secs ( 0.01 usr 0.01 sys + 0.61
cusr 1.68 csys = 2.31 CPU)
Result: PASS
The special output lines from this test mode have the prefix LeakTest: . The first such line
lists all the data points for the memory footprint size in the unit of kilo bytes (KB), collected
every 0.02 seconds. And the second line is the slope ( k ) of the data line that best fits these
data points. And in this case, k equals to 22.6 .
The slope of the line can usually serve as an indication for the speed of memory leaking.
The larger the slope is, the faster the leak is. A 2-digit data line slope here is very likely an
indication of memory leak. To be sure, we plot these data points in a graph using the
gnuplot tool.
69
Test Modes
There are quite some fluctuations in the graph. This is due to how garbage collector
normally behaves. It usually allocates page-sized or even larger memory blocks than
actually requested for performance reasons and delays the release of unused memory
blocks because of the sweep phase or something else. Still, it is clear that the memory
usage is going up over all.
We can try enforcing a full garbage collection cycle upon the entry of our request handler,
like this:
content_by_lua_block {
collectgarbage()
package.path = "/path/to/some/lib/?.lua;" .. package.path
ngx.say("ok")
}
This way we can ensure that there is no memory garbage hanging around after the point we
call the Lua builtin function collectgarbage() .
70
Test Modes
t/e.t .. TEST 1:
LeakTest: [2464 2464 2360 2464 2232 2520 2380 2536 2440 2320
2300 2464
2576 2584 2540 2408 2608 2420 2596 2596 2332 2648 2660 2460
2680 2320
2688 2616 2332 2628 2408 2728 2716 2380 2752 2360 2768 2376
2372 2376
2732 2800 2808 2816 2464 2396 2668 2688 2848 2672 2412 2416
2536 2420
2424 2632 2904 2668 2912 2564 2724 2448 2932 2944 2856 2960
2616 2672
2976 2620 2984 2600 2808 2980 3004 2996 3236 3012 2724 3168
3072 3536
3260 3412 3028 2700 2480 3188 2808 3536 2640 3056 2764 3052
3440 3308
3064 2680 2828 3372]
LeakTest: k=7.4
t/e.t .. ok
All tests successful.
Files=1, Tests=3, 6 wallclock secs ( 0.02 usr 0.00 sys + 0.62
cusr 1.75 csys = 2.39 CPU)
Result: PASS
We can see this time, the slope of the best-fitting line is much smaller, but still much larger
than 0.
71
Test Modes
And we can see that the line is still going upward relatively steadily over time.
Large fluctuations and variations in the memory footprint may create noises in our data
samples and even result in false positives. We already saw how big fluctuations may result
in large data-fitting line slopes. It is usually a good idea to enforce full garbage collection
cycles frequently to reduce such noises at least in GC-managed memory. The
collectgarbage() function, however, is quite expensive in terms of CPU resources and may
hurt the over-all performance very badly. Ensure you do not call it often (like in every
request) in the "benchmark test mode" introduced above or even in production applications.
In reality, this brute-force "check leak" test mode has helped catching quite a lot of real
memory leaks in OpenResty’s test suites over the years. Most of those leaks made their way
around the Valgrind test mode since they happened in GC-managed memory or NGINX’s
memory pools.
The NGINX no-pool patch mentioned in the previous section does not help here
Note since all the leaked memory blocks in the pool still get released before the
process exits.
Nevertheless, there exists one big drawback of this test mode. Unlike Valgrind, it cannot give
any detailed information about the locations where leaks (may) happen. All it reports are just
data samples and other metrics that verify just the existence of a leak (at least to some
extend). We shall see in a later chapter how we can use the "memory leak flame graphs" to
overcome this limitation even for leaks and big swings in GC-managed or pool-managed
memory.
72
Test Modes
Mockeagain Mode
SystemTap Mode
73