A supposedly common scenario where you have a HTTP/2 enabled reverse proxy layer in front of your application servers can have unexpected feature we have lately stumbled upon.
The problem is described by users of Safari browsers (regardless of the actual device, it affects iPhones, iPads as well as the desktop Mac OS) as "the page load time is very long". And indeed, where the page loads in miliseconds on any other browser on any other device, this combination of Apple devices and Safari browsers makes users wait for like 5 to 30 seconds before the page is rendered.
Unfortunately, the issue has a disastrous impact on the actual infrastructure - it is not the page that just loads long. It's much worse. It is the browser that makes as many requests as it is physically able to until it is finally satisfied and renders the page. If a client-side console is consulted in a browser - you see a single request. But from the server's perspective, it's the browser making hundreds of requests per second! During our tests, we have logged Safari waiting about 30 seconds before it rendered the page, making approximately 15000 requests to the server.
A complete disaster, a handful of Safaris could possibly take down a smaller site and even for us it was something noticeable.
We lost like few days tracking the issue, capturing the traffic with various HTTP debuggers and TCP packet analyzers on various devices and trying to narrow the issue to see a pattern behind it.
What it finally turned out to be was the combination of HTTP/2, Reverse proxy layer and specific configuration behind it.
It all starts with the common Connection: keep-alive header that is used to keep the TCP connection so that the browser doesn't open new connections when requesting consecutive resources.
The problem with this header is that since it is a default setting for HTTP/1.1, different servers treat it differently. In particular, the IIS doesn't send it by default while Apache always sends it by default.
And guess what, since Apache sends it, you can have an issue where it is the Apache that sits behing a HTTP/2 enabled proxy.
The scenario is as follows. The browser makes a request to your site and is upgraded to HTTP/2 by your front servers. Then, the actual server is requested and if it is the Apache (or any other server that sends the header) but HTTP/1.1 enabled, the actual server will return the Connection header to the proxy server. And if you are unlucky (as we were), the front server will just proxy the header to the client, together with the rest of the response.
So what, you'd ask. What's wrong with the header?
Well, the problem is, it is considered unnecessary for HTTP/2. The docs says:
The Connection header needs to be set to "keep-alive" for this header to have any meaning. Also, Connection and Keep-Alive are ignored in HTTP/2; connection management is handled by other mechanisms there.
The actual HTTP/2 RFC states otherwise
This means that an intermediary transforming an HTTP/1.x message to HTTP/2 will need to remove any header fields nominated by the Connection header field, along with the Connection header field itself. Such intermediaries SHOULD also remove other connection- specific header fields, such as Keep-Alive, Proxy-Connection, Transfer-Encoding, and Upgrade, even if they are not nominated by the Connection header field.
What could be considered unnecessary and be ignored, for some (hello Apple) is considered illegal. By analyzing TCP packets we have found out that Safaris internally fail with Error: PROTOCOL_ERROR (1) and guess what, they repeat the request over and over.
The issue has finally been narrowed down and it turned out others have identified it too in a similar scenario.
A solution is then to clear the Connection header from the outgoing response so that the browser never gets it. This potentially slows down browsers that can't make use of HTTP/2 connection optimizations but luckily, most of modern browsers support HTTP/2 for some time.
A happy ending and, well - it looks it is a bug at the Apple's side that should be taken care of to make it more compatible with other browsers that just ignore the issue rather than fail so miserably on it.
The bug is in the front server. The RFC is very clear on it:
"An endpoint MUST NOT generate an HTTP/2 message containing connection-specific header fields; any message containing connection-specific header fields MUST be treated as malformed (Section 184.108.40.206)."
Post a Comment