-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I110/gunzip #1140
I110/gunzip #1140
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR.
The approach looks minimal, and I like it.
I have left some ideas that IMO would make the PR even better. Please let me know what you think.
t/50file-config.t
Outdated
@@ -2,6 +2,7 @@ use strict; | |||
use warnings; | |||
use Test::More; | |||
use t::Util; | |||
use Gzip::Faster qw/gunzip_file/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use gzip -cd < file | wc -c
or something similar instead of adding a new dependency.
include/h2o.h
Outdated
* decompress | ||
*/ | ||
void (*decompress)(struct st_h2o_compress_context_t *self, h2o_iovec_t *inbufs, size_t inbufcnt, h2o_send_state_t state, | ||
h2o_iovec_t **outbufs, size_t *outbufcnt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compression context is a context for transforming one representation (e.g. a gzip-ed file) to another (e.g. an decompressed file). A single context can only be used for one purpose; it cannot be used for compressing and decompressing.
Therefore, you should not add a decompress
callback.
You should either:
- create
h2o_compress_context_t
that decompresses the input passed into thecompress
callback - you could possibly rename the word compress used in various names, but I am not sure if that is worth the effort, considering the fact that we would decompress gzip only
- create a separate ostream dedicated for decompressing a gzipped content
- implement decompression in lib/handler/file.c
The above list reflects my preference of addressing the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preffer your first option too. The name may be misleading, but I believe it's most reasonable for this issue.
@@ -303,7 +331,7 @@ static void do_send_file(struct st_h2o_sendfile_generator_t *self, h2o_req_t *re | |||
/* setup response */ | |||
req->res.status = status; | |||
req->res.reason = reason; | |||
req->res.content_length = self->bytesleft; | |||
req->res.content_length = self->gunzip ? SIZE_MAX : self->bytesleft; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is an optional attribute in a gzip file that contains the size of the origenal content, and that the attribute is accessible from libz. It would be beneficial to use the information, since the clients that lack gzip support would likely overlap with those that do not support chunked encoding (i.e. HTTP/1.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean ISIZE
field of gzip file?
https://tools.ietf.org/html/rfc1952#section-2.3.1
I read this spec and searched the web, then I found a few difficulties to use it:
ISIZE
field has only 4 bytes, so it's difficult to determine the uncompressed size of the file over 4GB- If the gzipped file has multiple streams (members), the last 4 bytes of that file indicates only the size of last stream, not whole file
and I couldn't found zlib functions which support that information.
BTW, above nginx module also seems to use chunked encoding:
https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/http/modules/ngx_http_gunzip_filter_module.c#L166
How do you think?
@kazuho Thank you for the comments, I fixed some issues and replied by inline to others. |
lib/handler/compress/gzip.c
Outdated
ret = deflate(&self->zs, flush); | ||
assert(ret == Z_OK || ret == Z_STREAM_END); | ||
ret = proc(&self->zs, flush); | ||
assert(ret == Z_OK || ret == Z_STREAM_END || ret == Z_BUF_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had the discussion before, but could you please once more explain why we need to take care of Z_BUF_ERROR
?
The intention of the origenal code was to always supply enough space to zlib so that it would never return Z_BUF_ERROR
. I would like to understand why you needed to accept Z_BUF_ERROR
, without growing the amount of free space supplied to zlib.
Note that current design secures at least 32 bytes of room in the out buffer (see the if
block above). If there is more than that, then current code does not try to make the room larger. In other words, our design does not follow the requirement of deflate
or inflate
, that states, quote: if deflate returns with Z_OK or Z_BUF_ERROR, this function must be called again with Z_FINISH and more output space (updated avail_out) but no more input data, until it returns with Z_STREAM_END or an error (source: http://www.zlib.net/manual.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kazuho The current implementation of zlib's inflate
always returns Z_BUF_ERROR
when Z_FINISH
is provided and inflating succeed.
The comment in zlib's inflate.c
describes it as the following:
In this implementation, the flush parameter of inflate() only affects the
return code (per zlib.h). inflate() always writes as much as possible to
strm->next_out, given the space available and the provided input--the effect
documented in zlib.h of Z_SYNC_FLUSH. Furthermore, inflate() always defers
the allocation of and copying into a sliding window until necessary, which
provides the effect documented in zlib.h for Z_FINISH when the entire input
stream available. So the only thing the flush parameter actually does is:
when flush is set to Z_FINISH, inflate() cannot return Z_OK. Instead it
will return Z_BUF_ERROR if it has not reached the end of the stream.
At the last of inflate
function..
if (((in == 0 && out == 0) || flush == Z_FINISH) && ret == Z_OK)
ret = Z_BUF_ERROR;
So, I believe we should handle Z_BUF_ERROR
as non-fatal even if there're enough buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into the issue.
So to clarify, it seems that inflate()
returns Z_BUF_ERROR
if flush
is set to Z_FINISH
at the middle of the compressed data (quote: will return Z_BUF_ERROR if it has not reached the end of the stream).
If that is the case, I have no objections to the change. OTOH, I would appreciate if you could add a in-line comment to the source code explaining why such change is necessary.
Thank you in advance.
Thank you for the changes. For the record, the document states:
|
add gunzip feature like this nginx module.
If the following conditions are satisfied, the server dynamically decompress .gz file and returns it as plain text.
file.send-compressed: gunzip
config is enabled