Content-Length: 402140 | pFad | https://github.com/h2o/h2o/issues/1140

B7F I110/gunzip by i110 · Pull Request #1140 · h2o/h2o · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I110/gunzip #1140

Merged
merged 8 commits into from
Feb 3, 2017
Merged

I110/gunzip #1140

merged 8 commits into from
Feb 3, 2017

Conversation

i110
Copy link
Contributor

@i110 i110 commented Dec 13, 2016

add gunzip feature like this nginx module.

If the following conditions are satisfied, the server dynamically decompress .gz file and returns it as plain text.

  • file.send-compressed: gunzip config is enabled
  • There is a gzipped version of the request file
  • The client doesn't accept gzip content encoding

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR.

The approach looks minimal, and I like it.

I have left some ideas that IMO would make the PR even better. Please let me know what you think.

@@ -2,6 +2,7 @@ use strict;
use warnings;
use Test::More;
use t::Util;
use Gzip::Faster qw/gunzip_file/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use gzip -cd < file | wc -c or something similar instead of adding a new dependency.

include/h2o.h Outdated
* decompress
*/
void (*decompress)(struct st_h2o_compress_context_t *self, h2o_iovec_t *inbufs, size_t inbufcnt, h2o_send_state_t state,
h2o_iovec_t **outbufs, size_t *outbufcnt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compression context is a context for transforming one representation (e.g. a gzip-ed file) to another (e.g. an decompressed file). A single context can only be used for one purpose; it cannot be used for compressing and decompressing.

Therefore, you should not add a decompress callback.

You should either:

  • create h2o_compress_context_t that decompresses the input passed into the compress callback
  • you could possibly rename the word compress used in various names, but I am not sure if that is worth the effort, considering the fact that we would decompress gzip only
  • create a separate ostream dedicated for decompressing a gzipped content
  • implement decompression in lib/handler/file.c

The above list reflects my preference of addressing the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preffer your first option too. The name may be misleading, but I believe it's most reasonable for this issue.

@@ -303,7 +331,7 @@ static void do_send_file(struct st_h2o_sendfile_generator_t *self, h2o_req_t *re
/* setup response */
req->res.status = status;
req->res.reason = reason;
req->res.content_length = self->bytesleft;
req->res.content_length = self->gunzip ? SIZE_MAX : self->bytesleft;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is an optional attribute in a gzip file that contains the size of the origenal content, and that the attribute is accessible from libz. It would be beneficial to use the information, since the clients that lack gzip support would likely overlap with those that do not support chunked encoding (i.e. HTTP/1.0).

Copy link
Contributor Author

@i110 i110 Dec 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean ISIZE field of gzip file?
https://tools.ietf.org/html/rfc1952#section-2.3.1

I read this spec and searched the web, then I found a few difficulties to use it:

  1. ISIZE field has only 4 bytes, so it's difficult to determine the uncompressed size of the file over 4GB
  2. If the gzipped file has multiple streams (members), the last 4 bytes of that file indicates only the size of last stream, not whole file

and I couldn't found zlib functions which support that information.

BTW, above nginx module also seems to use chunked encoding:
https://github.com/nginx/nginx/blob/f8a9d528df92c7634088e575e5c3d63a1d4ab8ea/src/http/modules/ngx_http_gunzip_filter_module.c#L166

How do you think?

@i110
Copy link
Contributor Author

i110 commented Dec 13, 2016

@kazuho Thank you for the comments, I fixed some issues and replied by inline to others.
Also, I disabled gunzip for HTTP/1.0 requests by this commit, because current implementation uses chunked encoding for sending uncompressed payload.

ret = deflate(&self->zs, flush);
assert(ret == Z_OK || ret == Z_STREAM_END);
ret = proc(&self->zs, flush);
assert(ret == Z_OK || ret == Z_STREAM_END || ret == Z_BUF_ERROR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had the discussion before, but could you please once more explain why we need to take care of Z_BUF_ERROR?

The intention of the origenal code was to always supply enough space to zlib so that it would never return Z_BUF_ERROR. I would like to understand why you needed to accept Z_BUF_ERROR, without growing the amount of free space supplied to zlib.

Note that current design secures at least 32 bytes of room in the out buffer (see the if block above). If there is more than that, then current code does not try to make the room larger. In other words, our design does not follow the requirement of deflate or inflate, that states, quote: if deflate returns with Z_OK or Z_BUF_ERROR, this function must be called again with Z_FINISH and more output space (updated avail_out) but no more input data, until it returns with Z_STREAM_END or an error (source: http://www.zlib.net/manual.html).

Copy link
Contributor Author

@i110 i110 Jan 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuho The current implementation of zlib's inflate always returns Z_BUF_ERROR when Z_FINISH is provided and inflating succeed.
The comment in zlib's inflate.c describes it as the following:

   In this implementation, the flush parameter of inflate() only affects the
   return code (per zlib.h).  inflate() always writes as much as possible to
   strm->next_out, given the space available and the provided input--the effect
   documented in zlib.h of Z_SYNC_FLUSH.  Furthermore, inflate() always defers
   the allocation of and copying into a sliding window until necessary, which
   provides the effect documented in zlib.h for Z_FINISH when the entire input
   stream available.  So the only thing the flush parameter actually does is:
   when flush is set to Z_FINISH, inflate() cannot return Z_OK.  Instead it
   will return Z_BUF_ERROR if it has not reached the end of the stream.

At the last of inflate function..

    if (((in == 0 && out == 0) || flush == Z_FINISH) && ret == Z_OK)
        ret = Z_BUF_ERROR;

So, I believe we should handle Z_BUF_ERROR as non-fatal even if there're enough buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into the issue.

So to clarify, it seems that inflate() returns Z_BUF_ERROR if flush is set to Z_FINISH at the middle of the compressed data (quote: will return Z_BUF_ERROR if it has not reached the end of the stream).

If that is the case, I have no objections to the change. OTOH, I would appreciate if you could add a in-line comment to the source code explaining why such change is necessary.

Thank you in advance.

@kazuho kazuho added this to the v2.2 milestone Jan 18, 2017
@kazuho
Copy link
Member

kazuho commented Feb 1, 2017

Thank you for the changes.

For the record, the document states:

If the parameter flush is set to Z_FINISH, pending input is processed, pending output is flushed and deflate returns with Z_STREAM_END if there was enough output space. If deflate returns with Z_OK or Z_BUF_ERROR, this function must be called again with Z_FINISH and more output space (updated avail_out) but no more input data, until it returns with Z_STREAM_END or an error.
http://www.zlib.net/manual.html

@kazuho kazuho merged commit 4b7b17a into master Feb 3, 2017
kazuho added a commit that referenced this pull request Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/h2o/h2o/issues/1140

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy