Stripping Cookies in Varnish

Varnish is a caching reverse proxy. It is configured using a DSL called Varnish Configuration Language (VCL). We use Varnish at work to speed up our widgets (little snippets of HTML loaded via AJAX).

For the most part, these widgets don’t use sessions or cookies at all, but a handful of them need to read our authentication cookie to show some user-specific data. By default, Varnish will bypass the cache if the request has the Authorization or Cookie HTTP headers set. This ensures that a page specific to a user is not served to other users. A lot of requests come in with our authentication cookie set, but our non-user-specific widgets don’t do anything with this cookie. To make Varnish cache these non-user-specific widgets, we strip the Cookie HTTP request header on most requests. The handful of user-specific widgets that do use the authentication cookie (and cannot be cached) are whitelisted by URL. The VCL looks something like this:

sub vcl_recv {
  if (req.url !~ "^/widgets/user-specific-widget") {
    unset req.http.Cookie;
  }
}

Since we don’t unconditionally return from this subroutine, the default implementation of vcl_recv does this later on:

# snippet of default implementation of vcl_recv
sub vcl_recv {
  if (req.http.Authorization || req.http.Cookie) {
    /* Not cacheable by default */
    return (pass);
  }
  return (lookup);
}

So for most widgets, the Cookie header will be unset, lookup will be returned, and the widget will be served from cache. For the user-specific widgets, the Cookie header will be preserved, pass will be returned, and Varnish will skip the cache and go to the backend for a fresh response.

Response Cookies

Varnish also checks the response headers coming back from the backend. If the Set-Cookie header is present, the response will not be cached. Again, this is to ensure that pages/cookies specific to a user are not cached and served to other users. The logic for this is in the default implementation of vcl_fetch. (vcl_fetch handles backend responses; vcl_recv handles incoming requests from users.)

# default implementation of vcl_fetch
sub vcl_fetch {
  if (beresp.ttl <= 0s ||
      beresp.http.Set-Cookie ||
      beresp.http.Vary == "*") {
    /*
     * Mark as "Hit-For-Pass" for the next 2 minutes
     */
    set beresp.ttl = 120 s;
    return (hit_for_pass);
  }
  return (deliver);
}

Our widgets don’t set any cookies, so Set-Cookie headers are never sent. Our user-specific widgets only consume cookies — they don’t ever set cookies. On these user-specific widgets, we set the Cache-Control header such that the response is not cached (beresp.ttl will evaluate to 0).

All of this worked fine for many months, but last week we rolled out a new authentication system. As part of the refactor, PHP sessions were enabled on all widgets — even the ones that weren’t user-specific. PHP sessions set cookies, so Varnish stopped caching all widgets because it saw the Set-Cookie header on the responses. Our Varnish hit rate was previously ~70%, which meant the backend servers started getting three times more traffic than usual. They started to buckle under the load.

The correct solution was to update the code to only enable sessions on the widgets that needed them, but as a short-term measure to keep the servers up, we temporarily configured Varnish to strip the Set-Cookie response header on most widgets. This was similar to what we were already doing with request cookies, but in reverse. This code was added to vcl_fetch:

sub vcl_fetch {
  if (req.url !~ "^/widgets/user-specific-widget") {
    unset beresp.http.set-cookie;
  }
}

(The code was taken from Varnish’s VCL examples wiki page.)

Having this code in place convinced Varnish that the non-user-specific widgets weren’t setting the Set-Cookie response header (even though we were), so Varnish resumed caching as before. The next day, we fixed the application code so that non-user-specific widgets really didn’t set the Set-Cookie response header, and the temporary VCL code was removed.

apc_mmap: mmap failed: Invalid argument

I went to adjust the APC cache size on one of my Ubuntu 10.04 LTS (Lucid) servers today. This particular server runs a single app, and it only uses 12M of cache, so I wanted to change the cache size from the default 32M to 16M. I put apc.shm_size=16M in /etc/php5/conf.d/apc.ini and restarted Apache… and to my surprise, Apache didn’t come back up. I found this error in the error log:

[apc-error] apc_mmap: mmap failed: Invalid argument

Which (via Google) led me to this very thorough blog post.

TL;DR, for older versions of APC like the version that ships with Ubuntu 10.04 LTS (APC version 3.1.3p1), you cannot use the “M” suffix. The correct syntax is apc.shm_size=16.

According to this bug report, it looks like APC >= 3.1.4 expects the M/G suffix, so this shouldn’t be an issue with Ubuntu 12.04 LTS (Precise).

Options FollowSymLinks

I knew that you should disable overrides by specifying AllowOverride None on busy Apache installations — if you enable overrides, Apache has to search every parent directory for .htaccess files. I also knew that SymLinksIfOwnerMatch is expensive because Apache needs to check every path component for symlinks to ensure the owners match. But today I learned from this blog post that not using FollowSymLinks is just as expensive as using SymLinksIfOwnerMatch, because Apache has to check every path component for symlinks and deny access if it finds one.

There are other good tips on Apache’s Performance Tuning page (who would have thought?), and on Mattias’ blog.

Vim Anti-Patterns

I saw this blog post on Twitter today. Some of these I already avoid, like moving a single line at a time and pressing d then i instead of just c. Others I still do, like hitting Esc. I use Ctrl+C sometimes, but I don’t think I’ve fully broken the Esc habit. Looks like I should be using Ctrl+[ anyway. There are lots of gems that I didn’t even know about, like using two backticks (``) to go back to where you were, and using ? to search backwards. Definitely worth a read!