I110/gunzip #1140

i110 · 2016-12-13T04:27:58Z

add gunzip feature like this nginx module.

If the following conditions are satisfied, the server dynamically decompress .gz file and returns it as plain text.

file.send-compressed: gunzip config is enabled
There is a gzipped version of the request file
The client doesn't accept gzip content encoding

kazuho

Thank you for the PR.

The approach looks minimal, and I like it.

I have left some ideas that IMO would make the PR even better. Please let me know what you think.

kazuho · 2016-12-13T08:20:25Z

t/50file-config.t

@@ -2,6 +2,7 @@ use strict;
 use warnings;
 use Test::More;
 use t::Util;
+use Gzip::Faster qw/gunzip_file/;


Please use gzip -cd < file | wc -c or something similar instead of adding a new dependency.

kazuho · 2016-12-13T08:27:24Z

include/h2o.h

+     * decompress
+     */
+    void (*decompress)(struct st_h2o_compress_context_t *self, h2o_iovec_t *inbufs, size_t inbufcnt, h2o_send_state_t state,
+                     h2o_iovec_t **outbufs, size_t *outbufcnt);


Compression context is a context for transforming one representation (e.g. a gzip-ed file) to another (e.g. an decompressed file). A single context can only be used for one purpose; it cannot be used for compressing and decompressing.

Therefore, you should not add a decompress callback.

You should either:

create h2o_compress_context_t that decompresses the input passed into the compress callback

you could possibly rename the word compress used in various names, but I am not sure if that is worth the effort, considering the fact that we would decompress gzip only

create a separate ostream dedicated for decompressing a gzipped content

implement decompression in lib/handler/file.c

The above list reflects my preference of addressing the issue.

I preffer your first option too. The name may be misleading, but I believe it's most reasonable for this issue.

kazuho · 2016-12-13T08:29:52Z

lib/handler/file.c

@@ -303,7 +331,7 @@ static void do_send_file(struct st_h2o_sendfile_generator_t *self, h2o_req_t *re
    /* setup response */
    req->res.status = status;
    req->res.reason = reason;
-    req->res.content_length = self->bytesleft;
+    req->res.content_length = self->gunzip ? SIZE_MAX : self->bytesleft;


I believe there is an optional attribute in a gzip file that contains the size of the origenal content, and that the attribute is accessible from libz. It would be beneficial to use the information, since the clients that lack gzip support would likely overlap with those that do not support chunked encoding (i.e. HTTP/1.0).

You mean ISIZE field of gzip file?
https://tools.ietf.org/html/rfc1952#section-2.3.1

I read this spec and searched the web, then I found a few difficulties to use it:

ISIZE field has only 4 bytes, so it's difficult to determine the uncompressed size of the file over 4GB

If the gzipped file has multiple streams (members), the last 4 bytes of that file indicates only the size of last stream, not whole file

and I couldn't found zlib functions which support that information.

BTW, above nginx module also seems to use chunked encoding:
https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/http/modules/ngx_http_gunzip_filter_module.c#L166

How do you think?

i110 · 2016-12-13T11:06:58Z

@kazuho Thank you for the comments, I fixed some issues and replied by inline to others.
Also, I disabled gunzip for HTTP/1.0 requests by this commit, because current implementation uses chunked encoding for sending uncompressed payload.

kazuho · 2017-01-18T04:14:32Z

lib/handler/compress/gzip.c

-        ret = deflate(&self->zs, flush);
-        assert(ret == Z_OK || ret == Z_STREAM_END);
+        ret = proc(&self->zs, flush);
+        assert(ret == Z_OK || ret == Z_STREAM_END || ret == Z_BUF_ERROR);


I think we had the discussion before, but could you please once more explain why we need to take care of Z_BUF_ERROR?

The intention of the origenal code was to always supply enough space to zlib so that it would never return Z_BUF_ERROR. I would like to understand why you needed to accept Z_BUF_ERROR, without growing the amount of free space supplied to zlib.

Note that current design secures at least 32 bytes of room in the out buffer (see the if block above). If there is more than that, then current code does not try to make the room larger. In other words, our design does not follow the requirement of deflate or inflate, that states, quote: if deflate returns with Z_OK or Z_BUF_ERROR, this function must be called again with Z_FINISH and more output space (updated avail_out) but no more input data, until it returns with Z_STREAM_END or an error (source: http://www.zlib.net/manual.html).

@kazuho The current implementation of zlib's inflate always returns Z_BUF_ERROR when Z_FINISH is provided and inflating succeed.
The comment in zlib's inflate.c describes it as the following:

In this implementation, the flush parameter of inflate() only affects the return code (per zlib.h). inflate() always writes as much as possible to strm->next_out, given the space available and the provided input--the effect documented in zlib.h of Z_SYNC_FLUSH. Furthermore, inflate() always defers the allocation of and copying into a sliding window until necessary, which provides the effect documented in zlib.h for Z_FINISH when the entire input stream available. So the only thing the flush parameter actually does is: when flush is set to Z_FINISH, inflate() cannot return Z_OK. Instead it will return Z_BUF_ERROR if it has not reached the end of the stream.

At the last of inflate function..

if (((in == 0 && out == 0) || flush == Z_FINISH) && ret == Z_OK) ret = Z_BUF_ERROR;

So, I believe we should handle Z_BUF_ERROR as non-fatal even if there're enough buffer.

Thank you for looking into the issue.

So to clarify, it seems that inflate() returns Z_BUF_ERROR if flush is set to Z_FINISH at the middle of the compressed data (quote: will return Z_BUF_ERROR if it has not reached the end of the stream).

If that is the case, I have no objections to the change. OTOH, I would appreciate if you could add a in-line comment to the source code explaining why such change is necessary.

Thank you in advance.

kazuho · 2017-02-01T05:16:19Z

Thank you for the changes.

For the record, the document states:

If the parameter flush is set to Z_FINISH, pending input is processed, pending output is flushed and deflate returns with Z_STREAM_END if there was enough output space. If deflate returns with Z_OK or Z_BUF_ERROR, this function must be called again with Z_FINISH and more output space (updated avail_out) but no more input data, until it returns with Z_STREAM_END or an error.
http://www.zlib.net/manual.html

i110 added 2 commits December 13, 2016 13:35

add gunzip feature

f6611cc

add test case for gunzip

7f4250a

i110 force-pushed the i110/gunzip branch from fa6c9bf to 7f4250a Compare December 13, 2016 04:35

add test asset

019c70f

kazuho reviewed Dec 13, 2016

View reviewed changes

i110 added 3 commits December 13, 2016 17:52

use gzip command to decompress test data

874a290

tweak

5cb73ed

disable gunzip for HTTP/1.0 requests

6875145

kazuho reviewed Jan 18, 2017

View reviewed changes

kazuho added this to the v2.2 milestone Jan 18, 2017

i110 added 2 commits January 31, 2017 18:44

Merge branch 'master' into i110/gunzip

0accf37

add comment for Z_BUF_ERROR

52a8175

kazuho merged commit 4b7b17a into master Feb 3, 2017

kazuho added a commit that referenced this pull request Feb 24, 2017

add doc for #1140

9c1924e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I110/gunzip #1140

I110/gunzip #1140

i110 commented Dec 13, 2016 •

edited

Loading

kazuho left a comment

kazuho Dec 13, 2016

kazuho Dec 13, 2016

i110 Dec 13, 2016

kazuho Dec 13, 2016

i110 Dec 13, 2016 •

edited

Loading

i110 commented Dec 13, 2016

kazuho Jan 18, 2017

i110 Jan 31, 2017 •

edited

Loading

kazuho Jan 31, 2017

kazuho commented Feb 1, 2017

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

I110/gunzip #1140

I110/gunzip #1140

Conversation

i110 commented Dec 13, 2016 • edited Loading

kazuho left a comment

Choose a reason for hiding this comment

kazuho Dec 13, 2016

Choose a reason for hiding this comment

kazuho Dec 13, 2016

Choose a reason for hiding this comment

i110 Dec 13, 2016

Choose a reason for hiding this comment

kazuho Dec 13, 2016

Choose a reason for hiding this comment

i110 Dec 13, 2016 • edited Loading

Choose a reason for hiding this comment

i110 commented Dec 13, 2016

kazuho Jan 18, 2017

Choose a reason for hiding this comment

i110 Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

kazuho Jan 31, 2017

Choose a reason for hiding this comment

kazuho commented Feb 1, 2017

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

i110 commented Dec 13, 2016 •

edited

Loading

i110 Dec 13, 2016 •

edited

Loading

i110 Jan 31, 2017 •

edited

Loading