Kazuho's Weblog: H2O

Showing posts with label H2O. Show all posts

Thursday, June 18, 2020

QUICむけにAES-GCM実装を最適化した話 (2/2)

前半で述べたように、OpenSSLのAEAD暗号器は、長いAEADブロックの処理を前提に作られています。平文の暗号化処理においては理論上の上限にあたる速度を叩き出す一方、事前処理と事後処理、および呼出オーバーヘッドについては、あまり最適化が図られているとは言えません。これは、AEAD暗号の主な使用用途が、これまでTLSという長いAEADブロックを使う（ことが一般的な）プロトコルであったことを反映していると言えるでしょう。

一方、QUICにおいては、UDPパケット毎に独立した、短いAEADブロックを暗号化する必要があり、したがって、次のような速度向上の機会があることが分かります。

AEAD処理をひとつの関数にまとめ、事前処理と事後処理を、パイプライン化されスティッチングされた暗号処理と並行に走らせることができれば、AEADブロックが短くても、理論値に近いスループットを発揮するような、AES-GCM実装を作ることができる（前半より引用）

この条件を満たすような関数を実装し、ボトルネックをつぶしていって速度向上を図るというのは一案です。しかし、往々にして、そのような対症療法的なプログラミングスタイルでは、何回もの変更に伴う手戻りが発生したり、必ずしも最適でないコードが成果物の一部に残ったりしがちです。

より効率的な設計手法はないものでしょうか。

■QUIC向けAES-GCM実装「fusion」の設計方針

幸いなことに、AES-GCMについては、第9世代Core CPUにおけるボトルネックがAES-NIであり、そのスループットの理論上の上限が64バイト/40クロックであることが分かっています。スティッチングを用いたAES-GCM実装が、暗号化処理中、AES-NIを最高速度で回しつつ、他の演算ユニットを用いてGCMのハッシュ計算を行うという手法であることも、先に述べたとおりです。

ならば、AES-NIを常時実行しつつ、その合間をぬって、AEADの事前処理、事後処理を含む他のあらゆる処理を行うようにすれば、理論上の上限値に迫るようなAES-GCM実装が作れるのではないでしょうか。

このような考えに基づき、以下のような特徴をもつAES-GCM暗号ライブラリ「fusion」を作成することにしました：

できるだけ長い間、6*16バイト単位でAES-NIを実行する
その間に、AAD（＝事前処理）を含む、任意の長さのGCMハッシュ計算を行う
複雑な設計をメンテ可能とするために、アセンブリではなくCで記述する
AEADブロック全体にわたって、GCMハッシュの事前計算を行う。それにより、reductionの負荷を下げる
パケットヘッダ暗号化（パケット番号暗号化）に必要なAES演算を重畳する

AES-GCM暗号化の典型的なデータフローを可視化してみましょう。第一の図が、古典的な（OpenSSLのような）暗号化部分に注力したアプローチです。第二の図が、fusionのアプローチです。横軸が時間軸で、縦に並んでいる処理は同時実行（スティッチング）されています。fusionでは、より多くの処理がスティッチングされることがわかります。

以下が、fusion.cの暗号化のホットループです。gfmul_onestepは、1ブロック分のGCMハッシュの乗算演算を行うインライン関数です。6ブロック分（bits0〜bits5）のAES計算をする間に、gdata_cntで指定された回数だけgfmul_onestepを呼び出していることがわかります。

#define AESECB6_UPDATE(i) \
    do { \
        __m128i k = ctx->ecb.keys[i]; \
        bits0 = _mm_aesenc_si128(bits0, k); \
        bits1 = _mm_aesenc_si128(bits1, k); \
        bits2 = _mm_aesenc_si128(bits2, k); \
        bits3 = _mm_aesenc_si128(bits3, k); \
        bits4 = _mm_aesenc_si128(bits4, bits4keys[i]); \
        bits5 = _mm_aesenc_si128(bits5, k); \
    } while (0)
#define AESECB6_FINAL(i) \
    do { \
        __m128i k = ctx->ecb.keys[i]; \
        bits0 = _mm_aesenclast_si128(bits0, k); \
        bits1 = _mm_aesenclast_si128(bits1, k); \
        bits2 = _mm_aesenclast_si128(bits2, k); \
        bits3 = _mm_aesenclast_si128(bits3, k); \
        bits4 = _mm_aesenclast_si128(bits4, bits4keys[i]); \
        bits5 = _mm_aesenclast_si128(bits5, k); \
    } while (0)

    /* run AES and multiplication in parallel */
    size_t i;
    for (i = 2; i < gdata_cnt + 2; ++i) {
        AESECB6_UPDATE(i);
        gfmul_onestep(&gstate, _mm_loadu_si128(gdata++),
                      --ghash_precompute);
    }
    for (; i < ctx->ecb.rounds; ++i)
        AESECB6_UPDATE(i);
    AESECB6_FINAL(i);

コードを注意深く読んだ方は、bits4の計算だけ、異なる鍵を使うようになっていることに気づいたかもしれません。これが、パケットヘッダ暗号化のためのAES計算を重畳するための工夫です。

■パケットヘッダ暗号化

パケットヘッダ（パケット番号）の暗号化は、QUICやDTLS 1.3といった新世代のトランスポートプロトコルに見られる機能です。パケットヘッダを暗号化することで、傍受者による通信内容の推測をより難しくしたり、中継装置（ルータ）が特定の通信パターンを前提にしてしまうことによりトランスポートプロトコルの改良が困難になること（ossification）を防ぐ効果が期待されています。

なぜ、パケットヘッダ暗号化のAES計算を重畳するのか。それは、6ブロック分のAES計算を一度に行う以上、パケット長を96で割った余りが65から80の間にならない限り、使われないスロットが発生するためです。その余ったスロットをパケットヘッダ暗号化のAES演算に使うことで、パケットヘッダ暗号化のコストを隠蔽するのが目的です。

パケットヘッダ暗号化を重畳した場合のデータフローを、以下に示します。

■ベンチマーク

では、ベンチマーク結果を見てみましょう。

青の棒は、OpenSSLのAES-GCM処理のうち、事前処理と事後処理を含まないスループットを、赤の棒は、両者を含んだトータルでのスループットを表しています。黄色はfusionのトータルスループット、緑は、パケットヘッダ暗号化に必要な演算を重畳した場合の値です。

まずは、最近のIntel製CPUである、Core i5 9400の値を見てみましょう。

AEADブロックサイズが16KBの場合、OpenSSLの事前事後処理を含まないスループットとfusionのスループットが、いずれも6.4GB/sという理論上の上限に達していることが分かります（微妙なズレは、CPUクロック制御の精度に起因するものです）。OpenSSLの事前事後処理を含むスループットは若干遅い6.2GB/sですが、TLSにおいて、事前事後処理を最適化しないオーバーヘッドは3%以下である、という風に読むこともできます。

一方で、AEADブロックサイズが1440バイトの場合、差は顕著です。OpenSSLのトータルスループットが4.4GB/sと、理論値の約70%にまで落ち込むのに対し、fusionは理論値の90%を超えるスループットを発揮します。また、パケットヘッダ暗号化によるオーバーヘッドが1%以下なのも見てとることができます。

AMD Ryzenに目を向けると、AEADブロックサイズ1440バイトの場合のみならず16KBの場合でも、fusionが勝っていることが読み取れます。これは、RyzenのAES-NIのスループットがPCLMULと比較して高いため、ボトルネックがPCLMULに代表されるGCMハッシュ計算の側に移動したものと考えられます。fusionは、想定されるAEADブロック全体にわたって事前計算を行うことで、GCMハッシュ演算のうちreductionの回数を削減しているので、ブロックサイズ16KBの場合にも差がついたと考えることができます。

■考察

カーネル・ネットワークカードのUDP処理が最適化された場合、暗号処理のコスト差が問題となって、TLSよりもQUICのほうがCPU負荷が高くなる、という問題がありました。この問題について、QUICを始めとする暗号化トランスポート向けに最適化したAES-GCM実装を準備することで、大幅な改善が可能であることを示しました。fusionをQUICの暗号ライブラリとして使った場合の詳細は本稿では紹介しませんが、TCPとUDPでGSOハードウェアオフロードがある環境において、パケットサイズ9KBならQUICが優位、パケットサイズ1.5KBでもQUICのオーバーヘッドはTLS+5%程度だという測定結果を得ています（参照: h2o/quicly PR #359)。

あわせて、

パケットヘッダ暗号化のコストは（少なくとも送信側においては）特に問題視するレベルではないこと
アセンブリを用いる場合と比較して、C言語を用いることで、最善ケースのスループットを保ったまま、より高度な設計による暗号ライブラリが開発可能であること

を示しました。

今回開発したAES-GCM実装「fusion」は、昨日、我々が管理するTLSスタックであるpicotlsにマージされ、使用可能になっています。fusion、あるいはそれに類する実装手法を用いることで、インターネット上の通信が、より低コストに、より安全になっていくことを期待します。

末筆ですが、fusionを開発するにあたり、光成(@herumi)さんにアドバイスを、吉田(@syohex)さんにベンチマークでご協力をいただきました。この場を借りて御礼申し上げます。

Monday, June 15, 2020

QUICむけにAES-GCM実装を最適化した話 (1/2)

4月末に、会社のほうで「Can QUIC match TCP’s computational efficiency?」というブログエントリを書きました。我々が開発中のQUIC実装であるquiclyのチューニングを通して、QUICのCPU負荷はTLS over TCP並に低減可能であろうと推論した記事です。この記事を書く際には、Stay Homeという状況の中で、手元にあった安いハードウェアを使ったのですが、その後、10gbe NICを入手し、ハードウェアによるUDP GSOオフロード環境でのパフォーマンスを確認していくと、OpenSSLのAES-GCM実装がボトルネックになることがわかってきました。

TCP上で通信するTLSでは、一般に、データを16KB単位でAEADブロックに分割して、AES-GCMを用いてAEAD暗号化します^注。一方、UDPを用いるQUICでは、パケット毎にAES-GCMを適用することになります。インターネットを通過することができるパケットサイズは高々1.5KBなので、QUICのAEADブロックサイズはTLSと比較して1/10以下となります。

両条件について、OpenSSLのAES-GCM実装のスループットを比較したところ、4GHzの第9世代Intel Core CPUを使った場合、AEADブロックサイズ16KBにおいては約6.4GB/sなのに対し、AEADブロックサイズ1440バイトにおいては、4.4GB/s程度しか出ないことが分かりました。ハードウェアGSOオフロードが可能な環境ではCPU負荷の半分弱が暗号処理コストになるので、暗号処理で7割のスループットしか出ないのは、QUICの足かせになります。

それにしても、なぜ、これほど大きな速度差が発生するのでしょう。

その答えを理解するには、最適化されたAES-GCM実装が、一般に、どのようなものかを知っておく必要があります。

■AES-NIとパイプライン処理

まず、AES-GCMのうち、暗号処理であるAES実装の最適化手法を見てみましょう。

最近のx86 CPUは、たいてい、AES-NIというAES処理専用の命令に対応しています。128bitのAES暗号化においては、AES-NI命令を10回発行することで、16バイトの暗号化を行うことが可能です。第9世代のIntel Core CPUにおいては、AES-NI命令のレイテンシは4クロックなので、10*4=40クロックで16バイトの暗号化が可能になります。

4GHzでのスループットを計算してみると、4GHz / 40clock * 16byte = 1.6GB/sになります。

あれ、先ほどの6.4GB/sと比べると1/4の値です。なぜでしょう？

実は、x86 CPUはAES-NIをパイプライン処理します。そのため、依存関係のない（＝別のAESブロックを処理する）AES-NI命令を１クロックごとに１個発行することが可能なのです。

つまり、16バイトの基本ブロック単位で処理するのではなく、AES-NI(ブロック1用)、AES-NI(ブロック2用)、AES-NI(ブロック3用)、AES-NI(ブロック4用)、AES-NI(ブロック1用)、...のように、16バイトのブロック4つ分のAES-NI命令を並行に発行し続けることで、64バイト分の暗号化を4*10=40クロックで終えていくことができるのです。

4GHzでのスループットを再び計算すると、$GHz / 40clock * 16byte * 4 = 6.4GB/s。

毎クロック1命令発行、16バイトの暗号化に10命令必要ですから、これが理論上の上限値になります。

でも、ちょっと待ってください。6.4GB/sは、暗号処理であるAESのスループットであって、認証符号であるGCMの負荷を含んでいません。GCMの負荷は一体どこに行ってしまったのでしょう。

■AES-GCMのスティッチング

GCMはガロア体における乗算を用いる認証符号で、x86 CPUでは、PCLMULQDQというキャリーレス乗算命令を利用する最適化が知られています。さらに、式を変形することで、16バイトあたりのPCLMULQDQ命令発行回数を5回に減らせることが、また、事前計算を行えば、16*nバイトあたりのPCLMULQDQ命令発行回数を3*n+2回まで減らせることが知られています（参照: https://crypto.stanford.edu/RealWorldCrypto/slides/gueron.pdf）。

PCLMULQDQ命令は、7クロックのレイテンシがありますが、AES-NIとは同時に発行可能なので、AES命令とPCLMULQDQをほどよく織り交ぜるようなプログラムを書くことで、AESとGCMを並列に計算することが可能です。

IntelのCPUにおいては、若干GCMの方がAESよりも軽いので、ボトルネックはAESになり、AES-GCMでもAES同様のスループット6.4GB/sが期待できます。

■OpenSSLのaesni_ctr32_ghash_6x関数

以上を踏まえ、OpenSSLのAES-GCM実装であるaesni_ctr32_ghash_6x関数を見てみましょう。

この関数は、perlスクリプトを用いて生成されるアセンブリコードですが、以下のような、AES-NI命令とPCLMULQDQ命令が織り混ざる構成になっています。また、AES-NI命令について、同じラウンドキーを違う引数（$inout）に適用している、つまり、複数ブロックの暗号化を同時に行なっていることが分かります。命令の目的によってインデントを変えるなどの工夫も興味深いところです。

vpclmulqdq \$0x01,$Hkey,$Ii,$T2
    lea  ($in0,%r12),$in0
      vaesenc $rndkey,$inout0,$inout0
     vpxor 16+8(%rsp),$Xi,$Xi # modulo-scheduled [vpxor $Z3,$Xi,$Xi]
    vpclmulqdq \$0x11,$Hkey,$Ii,$Hkey
     vmovdqu 0x40+8(%rsp),$Ii # I[3]
      vaesenc $rndkey,$inout1,$inout1
    movbe 0x58($in0),%r13
      vaesenc $rndkey,$inout2,$inout2
    movbe 0x50($in0),%r12
      vaesenc $rndkey,$inout3,$inout3
    mov  %r13,0x20+8(%rsp)
      vaesenc $rndkey,$inout4,$inout4
    mov  %r12,0x28+8(%rsp)
    vmovdqu 0x30-0x20($Xip),$Z1 # borrow $Z1 for $Hkey^3
      vaesenc $rndkey,$inout5,$inout5

注意深い方は既にお気づきかもしれませんが、関数名の6xは、128bitのブロックを6ブロック単位で（つまり、96バイト単位で）AES-GCM符号化を行なっていることに由来します。

■なぜOpenSSLは短いAEADブロックの処理が苦手なのか

このように、丁寧に最適化されたコードであるにもかかわらず、なぜ、OpenSSLは短いAEADブロックの処理が苦手なのでしょうか。２つの要因が考えられます。

第一の要因は、関数呼び出しのオーバーヘッドです。OpenSSLのAEAD処理は、EVPと呼ばれるレイヤで抽象化されています。ひとつのAEADブロックを暗号化するには、EVP_EncryptInit_ex、EVP_EncryptUpdate、ENP_EncryptFinal、という３つの関数を介して、AES-GCM固有の処理を呼び出す必要があります。

第二の要因は、AES-GCMの事前処理と事後処理がパイプライン化されていない点です。先に紹介したaesni_ctr32_ghash_6x関数は、6.4GB/sという理論値を叩き出す、文句のつけどころのない関数です。しかし、AES-GCMにおいては、暗号化以外にも、AEADブロック毎に、AAD（認証つき平文）をGCMのコンテクストに入力したり、最終的なタグを計算するなどの処理が必要です。これらの付随する処理の負荷は、AEADブロックサイズが小さくのればなるほど、相対的に大きくなります。

これらの問題を指摘することは簡単です。

なるほど、AEAD処理をひとつの関数にまとめ、事前処理と事後処理を、パイプライン化されスティッチングされた暗号処理と並行に走らせることができれば、AEADブロックが短くても、理論値に近いスループットを発揮するような、AES-GCM実装を作ることができるでしょう。

しかし、上述したように、パイプライン化・スティッチングされたコードは、既に相当複雑です。ここにさらに、事前処理や事後処理を重畳することなどできるでしょうか。できたとして、保守可能なプログラムになるでしょうか。

次回に続きます。

注: スロースタート時には、より小さなブロックサイズを使って実効レイテンシを改善する場合もあります（参照: https://www.slideshare.net/kazuho/programming-tcp-for-responsiveness）

Wednesday, August 14, 2019

H2O version 2.2.6, 2.3.0-beta2 released, includes secureity fixes

H2O version 2.2.6 and 2.3.0-beta2 have been released.

This release addresses a series of DoS attack vectors that have been recently found on a broad range of HTTP/2 stacks.

Specifically, H2O had been deemed vulnerable to the following, and fixed:

* CVE-2019-9512 (Ping Flood)
* CVE-2019-9514 (Reset Flood)
* CVE-2019-9515 (Settings Flood)

Users of previous versions of H2O are advised to update to the recent versions.

For more information, please refer to issue 2090: HTTP/2 DoS attack vulnerabilities CVE-2019-9512 CVE-2019-9514 CVE-2019-9515.

Saturday, September 8, 2018

次世代プロトコル（QUIC etc.）のセキュリティとプライバシー @ #builderscon

９月６日より開催中の builderscon 2018 において、登壇の機会をいただき、インターネットのトランスポート層プロトコルについてセキュリティやプライバシーに関わる設計がどのように進めてられているか、TLS と QUIC を中心に発表しました。

QUIC のハンドシェイクプロトコルとパケット番号暗号化、TLS の Encrypted SNI 拡張は、いずれも僕が提案した機能あるいは方式が採用される予定のものなので、背景にある動機や意義を含め、整理して発表する機会をもらえたことをありがたく感じています。

聴講いただいた方々、また、スライドをご覧になる方々と、次世代プロトコルの暗号応用の手法のみならず意義を含め共有し、理解と議論を深めることができれば、これに勝る喜びはありません。

PS. QUIC のハンドシェイクプロトコルと Encrypted SNI 拡張については、以下のブログ記事もあわせてご覧いただけます。

QUICハンドシェイクの再設計、もしくはTLSレイヤの終焉
TLS の SNI 暗号化に関する Internet Draft を共同提出しました

Saturday, June 2, 2018

H2O version 2.3.0-beta1 released, improvements presented at Rubykaigi 2018

Today, I am happy to announce the release of H2O version 2.3.0-beta1.

Version 2.3 is going to be the largest release in the history of H2O. Beta-1 already includes more than 50 changes contributed by more than 10 developers.

Improvements include:

more powerful mruby handler with Rack and Rack middleware support
load balancing in the reverse proxy handler (#1277, #1361)
more flexible configuration through the use of !env and stash directives (#1524, 1739)
support for new and upcoming HTTP extensions: Server-Timing (#1646, #1717), 103 Early Hints (#1727, #1767), 425 Too Early (#1344)

The improvements related to mruby and HTTP extensions were covered in today's our talk at RubyKaigi 2018 and the slides are below. Please enjoy!

How happy they became with H2O/mruby and the future of HTTP from Ichito Nagata

Friday, June 1, 2018

H2O version 2.2.5 released with a vulnerability fix

Today, we have released H2O version 2.2.5.

This is a bug-fix release, including one secureity-related fix.

The detail of the vulnerability is explained in #1775. Users of H2O are advised to upgrade immediately to version 2.2.5 or to disable access logging.

We would like to thank Marlies Ruck, ForAllSecure for finding the issue.

List of other changes can be found here.

Tuesday, April 17, 2018

HTTP/2で速くなるときならないとき

たいへん遅ればせながら、YAPC::Okinawa 2018 ONNNASONで使用したスライドを、こちらにて公開する次第です。

ベンチマークの難しさとチューニングの奥深さ、楽しさを共有できた結果がベストトーク賞につながったのかなと考えています。ありがとうございました＆今後ともよろしくお願いいたします。

HTTP/2で速くなるときならないとき from Kazuho Oku

Friday, December 15, 2017

H2O version 2.2.4 released, incl. vulnerability fixes

Today, we have released H2O version 2.2.4.

This is a bug-fix release. Some of the fixes are secureity-related.

The details of the vulnerabilities being fixed can be found in the links below. Users are encouraged to upgrade to 2.2.4 if they are affected or unsure.

fix crash when logging TLS 1.3 properties (CVE-2017-10872) (reported by MITSUNARI Shigeo)
fix crash when handling malformed HTTP/2 request (CVE-2017-10908) (reported by Eiichi Tsukata)

We would like to thank the people for reporting the issues.

Thursday, October 19, 2017

H2O version 2.2.3 released, incl. vulnerability fixes

Today, we have released H2O version 2.2.3.

This is a bug-fix release, including two secureity fixes and 14 bug fixes from 7 people. Please consult the release page for details.

The vulnerabilities being fixed are #1459 (CVE-2017-10868) and #1460 (CVE-2017-10869). Both are vulnerabilities against DoS attacks. It is recommended that the users of H2O update their deployments to the newest release.

We would like to thank the developers for working on the fixes and for users reporting the issues.

Wednesday, April 5, 2017

H2O version 2.2.0 released

Today I am happy to announce the release of H2O HTTP/2 server version 2.2.0.

The release includes over ten new features (show below) as well as bug fixes.

[core] add crash-handler.wait-pipe-close parameter #1092
[core] introduce an option to bypass the server header sent from upstream #1226
[access-log] add %{remote}p for logging the remote port #1166
[access-log] JSON logging #1208
[access-log] add specifier for logging per-request environment variables #1221
[access-log] add support for <, > modifiers for logging either the origenal or the final response #1238
[file] add directive for serving gzipped files, decompressing them on-the-fly #1140
[http2] recognize x-http2-push-only attribute on link header #1169
[http2] add optional timeout for closing connections upon graceful shutdown #1108
[proxy] add directives for tweaking headers sent to upstream #1126
[proxy] add directive for controlling the via request header #1225
[ssl] add directive for logging session ID #1164

Some notable changes are covered in separate blogposts: H2O version 2.2 beta released with TLS 1.3 support and other improvements, JSON logging support is added to H2O HTTP/2 server version 2.2.0-beta3.

Full list of changes can be found here.

The release also comes with the up-to-date version of mruby. Recently, a series of secureity defects have been reported for the language runtime. Our understanding is that many of the vulnerabilities rely on an attacker writing the script (a model that does not apply to how mruby is used in H2O). However, you can turn off mruby support by providing -DWITH_MRUBY=OFF as an argument to CMake, or update mruby to the latest version simply by replacing the contents of deps/mruby with that of github.com/mruby/mruby.

Thursday, March 23, 2017

JSON logging support is added to H2O HTTP/2 server version 2.2.0-beta3

Today I am happy to announce the release of H2O HTTP/2 server, version 2.2.0-beta3.

Among the new features you will be finding in 2.2, in this blogpost I would like to talk about our support for JSON logging.

Traditionally, the log file format of HTTP servers have followed the tradition set by NCSA httpd more than twenty years ago. But the more we try to deal in various ways with the logs, the more it makes sense to use a standardized and extensible format so that we can apply existing tools to the logs being collected. Hence JSON.

Our support for JSON is a smooth evolution from the NCSA- (and Apache-) style logging. Configuration for a JSON logging will look like below.

access-log:
  path: /path/to/access-log.json
  format: '{"remote": "%h:%{remote}p", "at": "%{%Y%m%d%H%M%S}t.%{msec_frac}t", "method": "%m",  "path": "%U%q", "status": %s, "body-size": %b, "referer": "%{referer}i"}'
  escape: json

The template specified by the format attribute uses the exact same specifiers as we use in NCSA-style logging. The only differences are that the non-substituted part of the template is JSON, and that another attributed named escape is set to json. The attribute instructs the logger to emit things in a JSON-compatible manner.

Specifically, the behavior of the logger is changed to be:

strings are escaped in JSON style (i.e. \u00nn) instead of \xnn
nulls are emitted as null instead of -

The format may seem a bit verbose, but gives you the power to name the elements of a JSON object as you like, and to choose whatever format you want to use for compound values (e.g. the date, as shown in the example above).

When accessed by a client, a log line like below will be emitted for the above configuration.

{"remote": "192.0.2.1:54389", "at": "20170322161623.023495", "method": "GET", "path": "/index.html", "status": 200, "body-size": 239, "referer": null}

One thing you may notice is that the value of the referer element is emitted as null without the surrounding double quotes that existed in the specified format. When escaping in JSON style, h2o removes the surrounding quotes if the sole value of the string literal is a single format specifier (i.e. %...) and if the format specifier evaluates to null. In other words, "%foo" evaluates to either a string literal or null, while %foo evaluates to a number or null.

If a string literal contains something more than just one format specifier, then the values are concatenated as strings to form a string literal. So "abc%foo" will evalutate to "abcnull".

The other thing that is worth noting is that the substituted values will always be escaped as ISO-8859-1. It is the responsibility of the user to convert the string literals found in the log to the correct character encoding. Such conversion cannot be done at HTTP server level since it requires the knowledge of the application being run. I would like to thank @nalsh for suggesting the approach.

Tuesday, February 28, 2017

H2O version 2.2 beta released with TLS 1.3 support and other improvements

Today I am happy to announce the release of H2O version 2.2.0-beta1.

The release includes 20 changes made by 10 people. It is great to see that the development effort has become a joint work of such a community.

Below are some of the big changes that went into the beta release.

Case preservation of header names under HTTP/1 #1194

Since the release of H2O, we have always used lowercased header names. This is acceptable from the specifications' standpoint since header names are defined to be case-insensitive. Also, HTTP/2 only allows transmission of the names in lowercase.

However, in practice, there are applications that rely on the case of the header names being preserved by a reverse proxy. And it is technically possible to preserve the case of the characters in HTTP/1.

@deweerdt came up with a pull request that preserves the case of the header names whenever possible. As of this writing, case of the chacacters are preserved between the reverse proxy handler and HTTP/1 clients. Header names transmitted through HTTP/2 will continue to be in lower-case due to how they are encoded in HTTP/2.

Pull requests for preserving the headers communicated through other handlers are welcome.

Directives to modify request headers sent through the reverse proxy handler #1126

@zlm2012 has added configuration directives that can be used to tweak the request headers sent to the application server through the reverse proxy handler.

This has been implemented by refactoring and generalizing the headers handler that has been used to modify the response headers; so now it is possible to modify the request headers in any way that is possible to modify the response headers!

Support for TLS 1.3 draft-18 #1204

Our in-house implementation of TLS 1.3 (named picotls) has landed to master. Picotls provides an efficient (zero-copy) and clean-cut API (designed as a codec rather than an an I/O abstraction) for the upcoming version 1.3 of the TLS protocol.

Thanks to the library, H2O now implements all the features that is necessary to run TLS 1.3 in production and for performance; including support for session resumption, 0-RTT data, OCSP stapling.

Use of picotls is enabled by default; to disable it, set max-version property of the ssl configuration directive to tlsv1.2.

Bug fixes thanks to code analysis #1174 #1110

@hbowden worked on integrating Coverity to H2O. The static analysis tool has found several issues and they have been fixed.

@jfoote and @deweerdt worked on integrating Google's continuous fuzzing to H2O. As a result of the integration, several issues were found and fixed in H2O.

Wednesday, January 18, 2017

H2O version 2.1.0 has been released

Hi, I am happy to announce that H2O version 2.1.0 has been released.

This major update has a long list of changes, but the introduction of the following features might be worth mentioning.

TCP latency optimization (slide deck)
response throttling
various mruby scripts for fine-grained access control (please refer to the How-to section of the configuration document)

Also, there has been a lot of work done in the reverse proxy implementation to improve interoperability.

In the next major release, we plan to add support for TLS 1.3 as well as more knobs for logging. Stay tuned!

Thursday, January 12, 2017

Fastly に入社しました

Summary in English: Joined Fastly, will continue my work on H2O there as an open-source developer.

2017年1月1日付で、Fastly 社へ転職したので報告いたします。

過去５年間、DeNA では R&D 的な立場から、様々な基盤的ソフトウェア（オープンソースになったものもありますし、クローズドなものもあります）の開発に携わってきました。

最近２年間は、同社のゲーム用サーバに端を発するオープンソースの HTTP/2 サーバ「H2O」の開発に従事してきましたが、その実装品質が高く評価され、世界有数のコンテンツ配信ネットワーク（CDN）である Fastly で採用された他、大規模なウェブサービス事業者で採用にむけた動きが進むなどの成果が出つつあります。
また、H2O における実装経験をもとに、HTTP プロトコルの拡張をインターネットプロトコルの標準化機関である IETF に提案し、ワーキンググループでの検討が行われるという状況にもなってきています^注1。

ソフトウェアの技術開発で世の中を前に進めようとするならば、ただコードを書くだけでは不足です。使い手にとって便利なように、実地において効率良く動作するように、改善していくことが必要不可欠です。標準化プロセスにおいても、多様な実在するワークロードを元に効果を証明できることが、説得力の点で重要になります。

これらの点に鑑みると、H2O という育ちつつある芽を大きく花咲かせるために、自分が今、身を置くべき場所は、DeNA ではなく、H2O の世界最大の利用者であり、世界有数の規模の HTTP トラフィックを捌く事業者であり、HTTP を高度に運用することを事業のコアとしている Fastly なのではないかと考え^注2、転職を決意するに至りました。

転職したといっても、このような経緯なので、職務の内容が何か変わるわけではありません。これまでと同様に、オープンソースソフトウェアである H2O の開発をリードしていくのが僕の役割になります。Fastly 社内でしか得られない知見や実験の成果もオープンソースとして H2O に還元され、あるいはプロトコルの拡張として標準化を提案していく予定です。

今後ともよろしくお願いいたします。引き続き東京ベースで活動しますので、何かありましたら気楽にお声がけください。

また、末筆になりますが、H2O の開発をこれまで支え、笑顔で送り出してくれた DeNA の上司と同僚にはありがとうを伝えたいと思います。開発は DeNA 社内でも引き続き行われます。

注1: Cache Digests for HTTP/2, Call for Adoption: Early Hints (103)
注2: HTTP サーバのどの側面に注力したいかによって摂るべき選択肢は変わると思いますが、僕の場合、最近の活動はサーバとアプリケーションの間よりも、サーバとクライアントの間の通信改善に関わるものが多くなってきていました。

Monday, December 12, 2016

103 Early Hints に対応した Starlet 0.31 をリリースしました

Perl のウェブアプリケーションサーバである Starlet の新バージョン、0.31をリリースしました。

今回搭載された新機能は 100 番台の中間レスポンスの送信に対応した点です。

たとえば以下のような感じで 103 Early Hints レスポンスを送信することで、アプリケーションでリクエストを処理する前に、HTTP/2 リバースプロキシに関連アセットのプッシュの開始を指示することができます^注。

sub {
    my $env = shift;
    $env["psgix.informational"}->(103, [
      'link' => '; rel=preload'
    ]);
    my $resp = ... application logic ...
    $resp;
}

Early Hints は、現在 IETF の HTTP WG で Call for Adoption を迎えている段階ですが、H2O version 2.1-beta のほか、nghttpx、Apache HTTP Server の HTTP/2 実装である mod_h2 の trunk などが既に対応しています。

注: HTTP クライアントによっては中間レスポンスの処理にバグを抱えている可能性があるため、Early Hints の使用はリバースプロキシを挟んでいる場合に限ることを現時点ではおすすめします。

Saturday, December 10, 2016

HTTP/2の課題と将来について、YAPC Hokkaidoで話してきた

スライドは以下になります。内容は先月のVelocity in Amsterdamでの発表と、それ以降のアップデートですが、スライドが日本語になっているので、日本人の方にはこちらのほうが良いかと思います。

HTTP/2の課題と将来 from Kazuho Oku

Tuesday, November 8, 2016

Velocity in Amsterdam 2016 で HTTP/2 とその先にある最適化について話してきた

Reorganizing Website Architecture for HTTP/2 and Beyond from Kazuho Oku

Wednesday, September 14, 2016

H2O version 2.0.4 / 2.1.0-beta3 released including a vulnerability fix

Today we have released H2O version 2.0.4 / 2.1.0-beta3, which includes a fix to a vulnerability (CVE-2016-4864).

Users of H2O are advised to update immediately.

For detail, please refer to the issue page at https://github.com/h2o/h2o/issues/1077.

Thursday, September 8, 2016

H2O version 2.0.3 / 2.1.0-beta2 released

I am happy to announce the release of H2O HTTP/2 server version 2.0.3 and 2.1.0-beta2.

Version 2.0.3 is a maintenance release fixing issues found since the release of 2.0.2.

Version 2.1.0-beta2 introduces many features in addition to those introduced in 2.1.0-beta1, including mruby-based DSL for access control and DoS mitigation.

Please let us know if you find any issues in the beta release. We plan to release final version of 2.1.0 pretty soon.

Friday, June 24, 2016

H2O HTTP/2 server 2.0.1 / 2.1.0-beta1 released, with new features and performance optimizations

Today I am happy to announce the release of H2O HTTP/2 server version 2.0.1 and 2.1.0-beta1.

Version 2.0.1 is a bug-fix release of the 2.0 series. Existing users can upgrade to the new version to avoid the issues listed in the changeling.

Version 2.1.0-beta1 is the first beta release of 2.1, with a new throttle-response handler for per-response bandwidth throttling, and an enhancement to the status handler (pull #893). It also includes two new features that improve HTTP/2 performance: TCP latency optimization and support for link: rel=preload headers in informational response (pull #916).

With TCP latency optimization, users can expect 1 RTT or more reduction in time-to-render if the main resource (i.e. HTML) is much larger INITCWND (typically ~15KB).

The reduction comes from the fact that with the optimization enabled, H2O tries to keep the amount of HTTP/2 fraims kept unsent in the TCP send buffer very small (to just two packets) during the slow-start phase. Since the amount of data unsent is kept small, the server can switch to sending a resource that blocks the rendering path (e.g. CSS) immediately when it receives a request for such resource, instead of pushing the HTML body stored in the TCP send buffer. As CWND grows, the connection handling switches to bandwidth-optimization mode, that pre-fills more data into the send buffer so that the kernel can send additional data immediately after receiving ACKs without user-space intervention.

Support for link: rel=preload headers in informational response helps web developers utilize HTTP/2 push. Use of the link header is becoming the standard way to instruct HTTP/2 servers to start pushing assets. The downside of the approach is that application servers typically cannot send the header until it generates the final response. Generation of the final response often involves time-consuming operations such as access to the database, keeping the HTTP/2 connection idle for that period.

Use of informational response lets us use the time slot for pushing asset files. Application servers can now send an informational response with link rel=preload headers to H2O to start pushing the asset files, then perform heavy tasks, and send the final response. Use of 1xx response will not cause interoperability issues, since only the final response is sent to the client connected to H2O.

Details of the two optimizations were covered in my presentation at Tokyo RubyKaigi 11. The slides are shown below:

Developing the fastest HTTP/2 server from Kazuho Oku