Nginx Gzip, High Concurrency and Memory

In the upcoming 0.4 release of the nginx-push-stream-module, it will have support for the Nginx Gzip filter. Being able to gzip messages will free up bandwidth and decrease latency when under high load. However, the default deflate settings Nginx uses are not ideal for the high concurrency and small messages that are typically sent with the push-stream module. By default, Nginx may allocate up to a relatively large (264kb) chunk of memory for zlib upfront for every request that supports gzip. This adds up fast when there are thousands of concurrent connections to Nginx.

Normally, Nginx intelligently allocates memory for a gzip-able request using the content length response header. However, the push-stream module, like the name implies, streams messages to the user: the content length is unknown at the beginning of the subscription. Without this hint, Nginx allocates the maximum amount of memory for the connection (default 264kb.)

Thankfully, Nginx has 2 undocumented settings (gzip_window and gzip_hash) that it uses to calculate the maximum memory to preallocate per request using a formula from zconf.h. These parameters are passed to deflateInit2 and are used by zlib for the size of the history buffer and internal compression state. The history buffer affects compression ratio, while the internal state affects both compression ratio and speed. To go in-depth on what these are used for take a look at LZ77 and LZ78 and Huffman Coding.

There is no “one size fits all” value for these parameters (in the context of tuning them for the push-stream-module.) It’s a trade-off of compression ratio to memory required for the number of concurrent users. To come up with settings that will work most of the time (for our system), I followed the algorithm Nginx uses based on the content-length header. I took a 60s sample of messages (3750 messages) and found the average length to be 1120 and a 85th percentile of 1756. So a gzip_window 2k; would work for approximately 85% of the messages. The algorithm starts with gzip_window and halves it until content-length+262 bytes is not larger the the window. So the default gzip_window 32k; has to be halved 4 times: 32768 >> 4 == 2048. Each time gzip_window is halved, so is gzip_hash (to a minimum of 512.) The default of gzip_hash 64k;1, halved 4 times, becomes gzip_hash 4k;. With these settings, the maximum allocation per connection is 32kb (8192 + (gzip_window << 2) + (gzip_hash << 2)).

Shows memory usage and number of users before and after modifying gzip_window and gzip_hash

Shows memory usage and number of users before and after modifying gzip_window and gzip_hash. Machine has 48G of RAM but only receiving ~40% number users as normal for testing.

I'm not too concerned with the compression ratio since the machines that I'm tweaking these settings on have TenG Ethernet cards. Plenty of bandwidth to spare so aiming for memory efficiency (though still seeing about a 40% reduction in bandwidth with these settings.) However, if I were to be trying to optimize the compression ratio, Nginx does a very good job of exposing internal metrics of zlib.

- $gzip_ratio is available for the overall request compression efficiency (good average for all messages in a session)
- Enabling debug and setting error_log to debug level, Nginx will log everything there is to know about the efficiency of deflate (good for per message compression ratio)

  1. Seems Nginx has an off-by-one error in its calculation. Default memLevel of 8 is actually a hash_size of 32k []